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ABSTRACT 



A method for organizing and programming distributed com- 
puter systems in which processors are connected via inter- 
connection or communication networks, such that even if 
many cases of hardware and software faults occur within the 
processors or within the networks, such faults do not lead to 
the failures in the application's computation. Parallel and 
asynchronous execution of multiple versions of a program 
module is performed with processors which are connected 
by a network without involving any direct interaction 
between the processors during the execution of the same or 
duTerent versions of the program module. The process 
includes a step for executing, for each of the distributed 
program modules, the primary version and its backup ver- 
sion concurrently by use of multiple processors, a step for 
checking, in each processor, the logical acceptability of the 
output data produced from its execution of a program 
module version, a step for sending, in each processor, the 
acceptable output data to the transmission paths, a step for 
receiving, in each processor, the message from the trans- 
mission paths, checking the logical acceptability of each 
received message, and detecting the messages belonging to 
the same program module, and a step for selecting, in each 
processor, a message based on a selection logic. 

18 Claims, 5 Drawing Sheets 
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received from the network, such as transmission line 27 of 
FIG. 1, the logic flows to an "INPUT AT block 303, and 
then to a "IS AT OK?" block 305. Blocks 303 and 305 in 
essence perform an input acceptance test for the received 
message. This input AT is the same as the test described in 5 
the first embodiment In the event that AT is not OK, the 
logic proceeds to the end of the main sequence, as will be 
discussed later. 

In the case where the result of the input AT is not OK f the 
message received is not used and is deleted. In the case i0 
where the result of the input AT is OK, the logic flows to a 
"STATE STORING" block 307. In block 307, the status 
information which is necessary for rollback to this point 
after the execution of the application module is saved in a 
buffer. Then, the logic moves to a "INCREMENTATION OF 
COUNTER" block 309 which causes the counter areas in 15 
the module information table 165 of FIG. 3 that correspond 
to the application module to be executed to be incremented. 
The logic next moves to an "EXECUTION OF FIRST 
VERSION" block 311. The logic of block 311 causes the 
first version module on the processors, namely A p 71 on the 20 
processor 11 and Ab 73 on the processor 13, to be executed. 
In a pair of blocks "RESULT AT* 313, and 'IS AT OK" 315, 
the acceptance test is done for the output data of the 
respective modules A^ 71 and A b 73. This result AT is the 
same as the test described in the first embodiment If the 25 
result of the test is OK, the logic moves to a '"MESSAGE 
SENDING" block 317, a message is prepared in the format 
in FIG. 2 and is sent to the network, including transmission 
line 27 of FIG. 1, by way of the network interface module 
151 as was shown in FIG. 3. The logic then flows to a 30 
"RETURN" block 319. "RETURN" block 319, also receives 
a logic flow from the "IS AT OK?" block 305 in the event 
that the test for AT fails. 

In the case where the result AT is not OK at block 315, the 
logic flows to a "MULTIPLE VERSION?" block 321, where 
an inquiry into whether or not there are multiple versions is 
made. If there are not multiple versions, the logic flows to 
the "RETURN" block 319. If there are multiple versions, the 
logic flows to a "STATE RESTORING" block 323, where 
the status before the execution of the first version module A p 
71 is recovered based upon the information that was stored 
during the execution of logic block 307. The logic then 
proceeds to an "EXECUTION OF SECOND VERSION" 
block 325, causing the second version module A b 73 on the 45 
processor 11, and A p 71 on the processor 21, to be executed. 
The logic than flows to a "RESULT AT' block 327 and an 
"IS AT OK" block 329, in which the acceptance test is done 
for the output data of the respective modules A p 71 in 
processor 13 and A b 73 in processor 11. If the acceptance test 
is met, the logic flows to the "MESSAGE SENDING" block 
317, and a message is prepared in the format in FIG. 2 and 
is sent to the network including transmission line 27 of FIG. 
1, by way of the network interface module 151 as was shown 
in FIG. 3. If the acceptance test is not met, the logic flows 
to the "RETURN" block 319. 55 

However, in the case where the logic flows from "IS AT 
OK?" block 329 to a "MESSAGE SENDING" block 317 
and a message is sent, the output message includes the 
information which indicates that the message occurred as the go 
result of a retry, in addition to the formatting which was 
illustrated in FIG. 2. 

In the event that processor 11 crashes, the input AT 159 
module of processors 19, 21, and 23 can recognize the crash 
of processor 11 by detecting the fact that the message 253 65 
from the processor 11 is not received within the pre- 
determined time. Then, the processor 23, the processor 
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which has been executing the primary version module X pt 
sends a message, indicating the crash of processor 11, to the 
processor 13. And after receiving this message indicating the 
crash of processor U, processor 13 executes the module A p 
71 as the first version. 

In addition, processor 23 may detect the situation where 
multiple producer processors, such as processors 11 and 13, 
have executed the same version for processing the same data 
when the messages which have the same MI 109 part have 
been received. Messages which have the same MI 109 part 
have the same MN 111 part, the same AB 113 part and same 
SN 115 parts. In this case, the processor 23 sends the 
message indicating that both of the processors 11 and 13 
have executed the same version. When the module A p 71 has- 
been used as the first version in both processors 11 and 13, 
the processor 13 uses the module A b 73 as the first version 
after-receiving this message. When the module A b 73 has 
been used as the first version in both processors 11 and 13, 
the processor U will use the module A p 71 as the first 
version after receiving this message. This mechanism is 
called the version switching suggestion (VSS) mechanism. 

The utilization of this method, even if the primary pro- 
cessor which executes the primary version module as the 
first version crashes, enables assured execution of the pri- 
mary version module. Moreover, a failure of a version in 
processing input data does not necessarily lead to the 
immediate dropout of a given processor from a distributed 
recovery block computing station. Again, a mechanism for 
status exchange among processors is not necessary in this 
method. The elimination of a requirement for an exchange 
mechanism under the inventive device and method herein 
provides flexibility and tolerance of both software and 
hardware faults. 

Although the invention has been derived with reference to 
particular illustrative embodiments thereof, many changes 
and modifications of the invention may become apparent to 
those skilled in the art without departing from the spirit and 
scope of the invention. Therefore, included within the patent 
warranted hereon are all such changes and modifications as 
may reasonably and properly be included within the scope of 
this contribution to the art. 

We claim: 

. 1. A method for distributed redundant execution of pro- 
gram modules in a distributed system which has at least two 
processors connected by transmission medium and in which 
only one version of a program module is present and stored 
in each of at least two processors, said method comprising 
the steps of: 

asynchronously executing, in each of said at least two 
processors, said program module to produce result data, 
after receiving an input data message containing input 
data necessary for the execution; 

sending, by each of said at least two processors which 
executed said program module, to at least one destina- 
tion processor an information message which contains 
at least both a result data produced by said execution of 
said program module and saidinput data utilized during 
said program module execution, through the transmis- 
sion medium; and 

receiving, in said at least one destination processor, said 
information messages and selecting one of the received 
information messages as a new input data message 
based upon both the result data and the input data 
contained in said received information messages. 

2. A method for distributed redundant execution of pro- 
gram modules in a distributed system which has at least two 
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processors connected by transmission medium and in which 
only one version of a program module is present and stored 
in each one of at least two processors, said method com- 
prising the steps of: 
asynchronously executing, in each of said at least two 
processors, said program module to produce result data, 
after receiving an input data message containing input 
data necessary for this execution; 
checking to conclusion, in each of said at least two 
processors which executed said program module, 
acceptability of the result data produced by said asyn- 
chronously executing said program module based only 
upon information within each of said at least two 
processors; 

sending, by each of said at least two processors which 
executed said program module and completed said 
checking with positive conclusion about acceptability, 
to at least one destination processor an information 
message which contains at least the result data pro- 
duced by said execution of said program module 
through the transmission medium; and 

receiving, in said at least one destination processor, said 
information messages and selecting one of the received 
information messages as a new input data message by 
comparing the contents of the result data in said 25 
received information messages. 

3. A method for distributed redundant execution of pro- 
gram modules in a distributed system which has at least two 
processors connected by transmission medium and in which 
only one version of a program module is present and stored 
in each one of at least two processors, said method com- 
prising the steps of: 

asynchronously executing, in each of said at least two 
processors, said program module to produce result data, 
after receiving an input data message containing input 
data necessary for said execution; 

checking to conclusion, in each of said at least two 
processors which executed said program module, 
acceptability of the result data produced by said asyn- 
chronously executing said program module based only 
upon information within each of said at least two 
processors; 

sending, by each of said at least two processors which 
executed said program module and completed said 
checking with a positive conclusion about the accept- 
ability, to at least one destination processor an infor- 
mation message which contains at least both the result 
data produced by said execution of said program mod- 
ule and said input data utilized during said program 
module execution through the transmission medium; 
and 

receiving, in said at least one destination processor, said 
information messages and selecting one of the received 
information messages as a new input data message 
based on both the result data and the input data con- 
tained in said received information messages. 

4. A method for distributed redundant execution of pro- 
gram modules in a distributed system which has at least two 
processors connected by transmission medium and in which 
at least first and second different versions of a program 
module are stored in each one of at least two processors, said 
method comprising the steps of: 

asynchronously executing, in each of said at least two 
processors, any of said at least first and second different 
versions of a program module which is stored therein to 
produce result data, after receiving an input data mes- 
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sage containing input data necessary for said asynchro- 
nous execution; 

sending, by each of said at least two processors which 
executed a version of the program module, to at least 
one destination processor an information message 
which contains at least both the result data produced by 
said asynchronous execution of said version and the 
information about said asynchronously executed ver- 
sion and about identity of said sending processor 
through the transmission medium; 

receiving, in said at least one destination processor, said 
information messages and selecting an acceptable one 
of the received information messages as a new input 
data message; and 

sending, from each destination processor which selected 
one of the received information messages, a second 
message which identifies which of said at least two 
processors produced said information messages that 
were discovered to be unacceptable during said select- 
ing step, to the processors that produced said informa- 
tion messages. 

5. Hie method for distributed redundant execution of 
program modules in a distributed system as recited in claim 
4, wherein said second message identifies which, if any, of 
said at least two processors executed a same version of the 
program module. 

6. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 

4, wherein said selection of one of the received information 
messages is accomplished by comparing the contents of the 
result data in the received information messages. 

7. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 

5, wherein said selection of an acceptable one of the 
received information messages is accomplished by compar- 
ing the contents of the result data in the received information 
messages. 

8. A method for distributed redundant execution of pro- 
gram modules in a distributed system which has at least two 
processors connected by transmission medium and in which 
at least first and second different versions of a program 
module are stored in each one of at least two processors, said 
method comprising the steps of: 

asynchronously executing, in each of said at least two 
processors, any one of said at least first and second 
different versions of a program module which is stored 
therein to produce result data, after receiving an input 
data message containing input data necessary for said 
asynchronous execution; 

sending, by each of said at least two processors, to at least 
one destination processor an information message 
which contains at least: 

the result data produced by said execution of said one 

of said at least first and second different versions; 
said input data utilized during said execution of said at 

least first and second different versions; 
and information about said executed version through 
the transmission medium: 
receiving, in said at least one destination processor, said 
information messages and selecting one of the infor- 
mation messages as a new input data message based on 
the result data and the input data contained in said 
information messages; and 
sending, from said each destination processor which 
selected one of the received information messages, a 
second message which identifies which of said at least 
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two processors produced said information messages 
that were discovered to be unacceptable during said 
selecting step to the processor that produced said 
information message. 

9. The method for distributed redundant execution of 5 
program modules in a distributed system as recited in claim 

8, wherein said second message identifies which of said at 
least two processors executed a same version of the program 
module. 

10. A method for distributed redundant execution of 10 
program modules in a distributed system which has at least 
two processors connected by transmission medium and in 
which at least first and second different versions of a 
program module are stored in each of at least two proces- 
sors, said method comprising the steps of: is 

asynchronously executing, in each of said at least two 
processors, any one of said at least first and second 
different versions of a program module to produce 
result data, after receiving an input data message con- 
taining input data necessary for said asynchronous 20 
execution; 

checking to conclusion, in each of said at least two 
processors which executed said any one of said at least 
first and second different versions of the program 
module, acceptability of the result data produced by 25 
said execution based only upon information within 
each of said at least two processors; 

sending, by each of at least two processors which 
executed one of said at least first and second different 3Q 
versions of the program module, to at least one desti- 
nation processor an information message which con- 
tains at least: 

the result data produced by said execution of said any 
one version and 35 

the information about said asynchronously executed 
version and about identity of said sending processor 
through the transmission medium; 

and 

receiving, in said at least one destination processor, said 40 
information messages and selecting one of the infor- 
mation messages as a new input data message. 

11. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 
10, wherein said selecting one of said information messages 45 
step is accomplished by selecting a first information mes- 
sage received. 

12. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 
10, wherein said selecting one of said information messages 50 
step is accomplished based on the identities of said at least 
first and second versions of a program module that produced 
the information messages. 

13. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 55 
10, wherein said selecting one of said information messages 
step is accomplished by comparing the contents of the result 
data in the received information messages. 

14. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim go 
13, wherein each destination processor which selected one 

of the information messages sends a second message which 
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identifies which of said at least two processors produced said 
information messages that were discovered to be unaccept- 
able during said selecting step, to the processors that pro- 
duced said information messages. 

15. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 
14, wherein said second message identifies which, if any, of 
said at least two processors executed a same version of the 
program module. 

16. A method for distributed redundant execution of 
program modules in a distributed system which has at least 
two processors connected by transmission medium and in 
which at least first and second different versions of a 
program module are stored in each of at least two proces- 
sors, said method comprising the steps of: 

asynchronously executing, in each of said at least two 
processors, any of said at least first and second different 
versions of a program module which is stored therein to 
produce result data, after receiving a message contain- 
ing input data necessary for said asynchronous execu- 
tion; 

checking to conclusion, in each of said at least two 
processors which executed said one of said at least first 
and second different versions of the program module, 
acceptability of the result data produced by said asyn- 
chronously executing said one version of said program 
module based only upon information within each of 
said at least two processors; 

sending, by each of at least two processors which 
executed one of said at least first and second different 
versions of the program module, to at least one desti- 
nation processor an information message which con- 
tains at least: 

the result data produced by said asynchronously 
executing said one version of said program module; 

said input data utilized during said one version execu- 
tion; and 

information about said executed version and about 
identity of said sending processor through the trans- 
mission medium; 

and 

receiving, in said at least one destination processor, said 
information messages and selecting one of the infor- 
mation messages as a new input data message based on 
the result data and the input data contained in said 
received information messages. 

17. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 
16, wherein each destination processor which selected one 
of the received messages sends a second message which 
identifies which of said at least two processors produced said 
information messages that were discovered to be unaccept- 
able during said selecting step, to the processors that pro- 
duced said information messages. 

18. The method for distributed redundant execution of 
program modules in a distributed system as recited in claim 
16, wherein said second message identifies which, if any, 
said at least two processors executed a same version of the 
program module. 

***** 
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ABSTRACT 



A method and apparatus are disclosed for achieving collec- 
tive consistency in the detection and reporting of failures in 
a distributed computing system having multiple processors. 
Each processor is capable of being called by a parallel 
application for system status. Initially, each processor sends 
the other processors its view on the status of the processors. 
It then waits for similar views from other processors except 
those regarded as failed in its own view. If the received 
views are identical to the view of the processor, the proces- 
sor returns its view to the parallel application. In a preferred 
embodiment, if the views are not identical to its view, the 
processor sets its view to the union of the received views and 
its current view. The steps are men repeated. Alternately, the 
steps are repeated if the processor does not have information 
that each of the processors not regarded as failed in its view 
forms an identical union view. In another preferred 
embodiment, the method is terminated if a quorum is not 
formed by the processors which are not regarded as failed. 
Alternatively, after sending its view, the processor waits for 
an exit condition. Depending on the exit condition, the 
processor sets its view to a quorum view and sends a 
"DECIDE" message to the other processors: In another 
embodiment, the processor updates its view and the method 
steps are repeated. 

47 Claims, 9 Drawing Sheets 
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"DECIDE" message from one of the other processors, and 
the "DECIDE" message includes a view VF which contains 
the view VF of processor. In this case, the view VF of 
processor i is set to the view VF according to the method 
step 70. The processor i then sends a special "DECIDE" 5 
message with VF to all other processors and returns from the 
message interface 9 with its view according steps 68 and 47, 
respectively, similar to the first exit condition. 

Still referring the flowchart depicted in FIG. 9, block 73 
represents the third exit condition from the waiting step 65 10 
of the method. This condition occurs when the processor i 
has sufficient information to determine that there is only one 
view VF that any processor could possibly have received 
from a quorum of processors and VF contains the view of 
the processor L According to step 74 of the method, the view 15 
VF of processor i is set to be equal to the view VF. Next, 
the method continues by repeating the method steps starting 
with step 41. As an example, assuming the adopted quorum 
family is the family of majority sets of processors, the 
processor i would exit step 65 via the condition block 73 if 20 
it has received views VF from exactly half the processors 
participating in the protocol so that no processor would 
receive a different view from a quorum of the processors. 
Next, the view of processor i is updated with view VF. The 
method steps are then iterated beginning with step 41. 75 

The fourth exit condition is shown by block 75 of the 
flowchart of FIG. 9. The condition holds true when the 
processor i has sufficient information to determine that no 
participant processor has received identical views from a 
quorum of processors. In this case, the view VF of the 30 
processor i is updated to be the same as the union of all 
views it received, according to step 76. As a result, the set 
of failed processors reported in VF is the union of the sets 
of failed processors reported in all the views received by 
processor L The method steps are next repeated starting with 35 
step 41, as in the case of the third exit condition. Thus, in 
step 42, processor i sends its updated view VF before 
checking again for one of the exit conditions in block 65. For 
example, if the adopted quorum family is the family of 
majority sets of processors, processor i would exit the 40 
waiting step 65 via condition block 75 if processor i had 
received pairwise distinct views from two more than half the 
processors. 

FIG. 10 illustrates in the form of a flowchart, a preferred 
embodiment for a method for detecting failures in a distrib- 45 
uted system to be practiced with the method for achieving 
collective consistency of the present invention. The embodi- 
ment is referred to as a Multiphase Packet Oriented Failure 
Detector which may be used as a failure detector 32 shown 
in the flowchart of FIG. 4. Again, the notation Vx(m) is used 50 
to indicate the value corresponding to processor from the 
data structure Vx where Vx ranges over the data structures 
VR, VN, VT, VI. and VF. 

The method steps of the failure detector are performed 
logically concurrently for each processor 3 of the distributed 55 
computing system 1. In summary, the method includes the 
first step of determining by each processor 3 a pattern of 
failures by all the processors to meet certain deadlines. In the 
second step, a local view is established by the processor 3 as 
to which of the processors have failed, using the determined 60 
pattern of failures. Next, the processor 3 exchanges its local 
view with the views similarly formed by the other proces- 
sors. The exchange is based on a collective consistency 
protocol which assures that if any two processors 3 return 
their views and one processor is not regarded as failed in the 65 
view of the other processor, then the two returned views are 
identical. As a result of performing these method steps, 
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failed processors 3 are consistently detected and reported by 
all the processors. 

In the preferred embodiment illustrated by the flowchart 
of FIG. 10, the method steps are executed logically concur- 
rently by each processor i far each processor or for which 
processor i has VT(m)>0. The notation VT(m) represents the 
total number of data packets to be received by processor i 
from each of the other processors m. In step 81, a nonlock- 
ing multireceive call with input parameter VR is issued. 
Accordingly, processor/waits until all data packets VR(m) 
from each processor or are received- In step 82, the method 
determines whether a certain condition is satisfied before 
processor i exits the waiting step 81. In a preferred embodi- 
ment of the invention, the exit condition is whether all 
VR(m) packets are received by the processor i before a 
timeout In the case of the affirmative branch from step 82, 
i.e., when VN(m)=0, the corresponding failure indicator 
VI(m) is set to zero (0), according to step 83. Next, in step 
84, the number of packets to be received VT(m) is decre- 
mented by the number of requested packets actually 
received by processor i from processor or, i.e., by the value 
of VR(m)-VN(m). The method continues with step 80 to 
begin another round of receiving packets. 

In the case of the negative branch from the decision block 
82, the failure indicator VI(m) corresponding to processor m 
is incremented by VN(m) according to step 85. The value of 
VI(m) is then compared with a threshold for processor i to 
receive packets from the other processors. In a preferred 
embodiment, the interface 2 includes a broadcast medium 
and the threshold is for processor i to Teceive packets sent to 
the broadcast medium by the other processors m. 

If the failure indicator VI(m) exceeds the threshold, 
processor i concludes that processor m has either failed or is 
too slow in its operation. The processor i then sets its view 
VF(m) to one (1) to indicate that processor or has failed and 
sets VT(m) to zero (0) to indicate that no further packets are 
to be requested from processor m. The method continues 
with step 80 to begin another round of sending and receiving 
packets. In the case the failure indicator VT(m) does not 
exceed the threshold, step 84 is performed and the method 
similarly continues with step 80. 

In another preferred embodiment of the method for 
detecting faults illustrated by flowchart of FIG. 4 and based 
on the failure detector of FIG. 10, the method further 
includes the step of reorganizing remaining work to be 
performed by the distributed system 1 among the still 
operational processors 3 once failed processors are detected. 

While several preferred embodiments of the invention 
have been described, it should be apparent that modifica- 
tions and adaptations to those embodiments may occur to 
persons skilled in the art without departing from the scope 
and the spirit of the present invention as set forth in the 
following claims. 

What is claimed is: 

1. A computer-implemented method for achieving collec- 
tive consistency in detecting failures in a distributed com- 
puting system, the system having a plurality of processors 
participating in a collective call by a parallel application, the 
method comprising the steps, performed for each processor, 
of: 

sending, by the processor to the other processors, a view 
representing the status of the other processors, includ- 
ing status as to which of the other processors have 
failed; 

waiting until the processor receives views from all other 
processors except those regarded as failed in the view 
of the processor, the views being sent by execution of 
the step of sending; and 
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returning the view of the processor to the parallel appli- 
cation if the received views are identical to the view of 
the processor, 

whereby collective consistency is achieved, collective 
consistency being a condition which holds whenever it 
is true that, if any two processors return their views to 
the parallel application and one of the two processors is 
not regarded as failed in the view of the other processor, 
then the two views are identical. 

2. The method as recited in claim 1 further comprising the 
steps, performed immediately before the step of returning if 
the received views are not identical to the view of the 
processor, of: 

updating the view of the processor to be the same as a 
union of all the received views and the current view of 
the processor; and 

repealing the steps of the method. 

3. The method as recited in claim 1 further comprising the 
steps, performed immediately before the step of returning, 
of: 

updating the view of the processor to be the same as a 
union of all the received views and the current view of 
the processor; 

determining whether the processor has information that 
each of the other processors not regarded as failed in 
the union forms a union identical to the union of the 
processor; and 

repeating the steps of the method. 

4. The method as recited in claim 2 further comprising the 
steps, performed immediately after the step of sending, of: 

determining whether the processors not regarded as failed 
in the view of the processor form a quorum, the quorum 
being any element of a previously chosen plurality of 
subsets of a set, the subsets having a property that any 
two of the subsets intersect; and terminating the method 
without returning to the parallel application if the 
quorum is not formed 

5. The method as recited in claim 1, wherein 
the step of waiting is performed until: 

a group of the processors form a quorum, the quorum 
being any element of a previously chosen plurality of 
subsets of a set, the subsets having a property that 
any two of the subsets intersect; 

the views received from the processors in the quorum 
are identical; and 

each of the identical views contains the view of the 
processor, and 

the method further comprises the steps, performed 
immediately before the step of returning, of: 
updating the view of the processor to be the same as 

the identical views; and 
sending a DECIDE message to all other processors, 
the message including the updated view of the 
processor. 

6. The method as recited in claim 1, wherein 
the step of waiting is performed until: 

the processor receives a DECIDE message from one of 

the other processors; and 
the message has a view which includes the current view 

of the processor; and 
the method further comprises the steps, performed 
immediately after the step of waiting, of: 
updating the view of the processor to be the same as 

the view of the message; and 
sending a DECIDE message to all other processors, 
the message including the updated view of the 
processor. 
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7. The method as recited in claim 1, wherein 
the step of waiting is performed until: 

the processor has information of a second view that 
contains the view of the processor, and 
5 the second view is the only view that could possibly be 
held by a quorum, the quorum being any element of 
a previously chosen plurality of subsets of a set, the 
subsets having a property that any two of the subsets 
intersect; and 

10 the method further comprises the steps, executed 
immediately after the step of waiting, of: 
updating the view of the processor to be the same as 

the second view; and 
repeating the steps of the method. 

8. The method as recited in claim 1, wherein 

15 

the step of waiting is performed until: 
the processor has information that none of the other 
processors received identical views from a group of 
the processors, the group forming a quorum, the 
quorum being any element of a previously chosen 
20 plurality of subsets of a set, the subsets having a 

property that any two of the subsets intersect; and 
the method further comprises the steps, performed 
immediately after the step of waiting, of: 
updating the view of the processor to be the same as 
25 a union of the views received and the current view 

of the processor, and 
repeating the steps of the method 

9. A computer-implemented method for detecting and 
reporting failures among a plurality of processors of a 

30 distributed computing system, the processors participating 
in a collective call by a parallel application, the method 
comprising the steps of: 
deteniiining, by each processor of the call, a pattern of 
failures by the processors to meet a plurality of dead- 
35 lines; 

forming a local view, by the processor, as to which ones, 
of the processors have failed, the local view being 
based on the pattern; and 
exchanging the local view with the views of the other 

40 processors, the exchange being based on a collective 
consistency protocol such that if any two processors 
return their views to the parallel application and one of 
the two processors is not regarded as failed in the view 
of the other processor, then the two views are identical, 

45 whereby failed ones of the processors are consistently 
detected and reported by the processors. 

10. The method as recited in claim 9, wherein the col- 
lective consistency protocol is such that if any two proces- 
sors participating in a collective call return their views to the 

50 parallel application, then the views returned are identicaL 

11. The method as recited in claim 9, wherein the collec- 
tive consistency protocol is such that if any failure is 
detected locally by a processor participating in a collective 
call and the processor returns to the parallel application, then 

55 the failure is reported to the interface. 

12. The method as recited in claim 9, wherein the inter- 
face includes a broadcast medium and the deadlines are for 
the processors to receive a plurality of packets from the 
broadcast medium, the packets being sent to the broadcast 

60 medium by the processors. 

13. The method as recited in claim 12, wherein the step of 
determining includes the step of associating with each 
pattern a numerical measure relative to a threshold for 
receiving the packets by the processors. 

65 14. The method as recited in claim 13, wherein the 
numerical measure is a number of consecutive failures by a 
processor to meet a deadline. 
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15. The method as recited in claim 9 further comprising 
the step of reorganizing work to be performed by the 
distributed computing system among the processors not 
detected as failed. 

16. A processor for use in a distributed computing system 
of processors, the processors participating in a collective call 
by a parallel application for status of the distributed system, 
the processor comprising: 

means for sending to the other processors of the distrib- 
uted system a view of the processor on the status of the 
other processors, including status as to which of the 
other processors have failed; 

means for waiting until the processor receives the views 
from all other processors except those regarded as 
failed in the view of the processor, the views being sent 
by execution of the step of sending; and 

means for returning the view of the processor to the 
parallel application if the received views are identical 
to the view of the processor, 

whereby collective consistency is achieved among the 
processors as to the status of the processors returned to 
the parallel application, collective consistency being a 
condition which holds whenever it is true that, if any 
two processors return their views to the parallel appli- 
cation and one of the two processors is not regarded as 
failed in the view of the other processor, then the two 
views are identical. 

17. The processor as recited in claim 16 further compris- 
ing 

means for updating the view of the processor to be the 
same as a union of all the views received and the 
current view of the processor, if the received views are 
not identical to the view of the processor. 

18. The processor as recited in claim 16 further compris- 
ing: 

means for updating the view of the processor to be the 
same as a union of all the views received and the 
current view of the processor; 

means for determining whether the processor has infor- 
mation that each of the other processors not regarded as 
failed in the union forms a union identical to the union 
of the processor; and 

means for repeating the operation of the processor if the 
processor does not have the information. 

19. The processor as recited in claim 17 further compris- 
ing: 

means for determining whether the processors not 
regarded as failed in the view of the processor form a 
quorum, the quorum being any element of a previously 
chosen plurality of subsets of a set, the subsets having 
a property that any two of the subsets intersect; and 

means for terminating without returning to the parallel 
application if the quorum is not formed, the means for 
determining the quorum and terminating means oper- 
ating immediately after the operation of the means for 
sending. 

20. The processor as recited in claim 16, wherein 
the waiting means operates until: 

a group of the processors form a quorum, the quorum 
being any element of a previously chosen plurality of 
subsets of a set, the subsets having a property that 
any two of the subsets intersect; 

the views received from the processors in the quorum 
are identical; and 

each of the identical views contains the view of the 
processor; and 
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the processor further comprises: 
means for updating the view of the processor to be 

the same as the identical views; and 
means for sending a DECIDE message to all other 
5 processors, the message including the updated 

view of the processor. 

21. The processor as recited in claim 16, wherein 
the means for waiting operates until: 

the processor receives a DECIDE message from one of 
10 the other processors; and 

the message has a view which includes the current view 

of the processor; and 
the processor further comprises: 
means for updating the view of the processor to be 
13 the same as the view of the message; and 

means for sending a DECIDE message to all other 
processors, the message including the updated 
view of the processor. 

22. The processor as recited in claim 16, wherein 
20 the means for waiting operates until: 

the processor has information of a second view that 

contains the view of the processor, and 
the second view is the only view that could possibly be 

held by a quorum, the quorum being any element of 
25 a previously chosen plurality of subsets of a set, the 

subsets having a property that any two of the subsets 

intersect; and 
the processor further comprises: 

means for updating the view of the processor to be 
the same as the second view; and 

means for repeating the operation of the processor. 

23. The processor as recited in claim 16, wherein 
the waiting means operates until: 

35 the processor has information that none of the other 
processors received identical views from a group of 
the processors, the group forming a quorum, the 
quorum being any element of a previously chosen 
plurality of subsets of a set, the subsets having a 
^ property that any two of the subsets intersect; and 

the processor further comprises: 
means for updating the view of the processor to be 
the same as a union of the views received and the 
current view of the processor; and 
45 means for repeating the operation of the processor. 

24. A processor for. use in a distributed computing system 
of processors, the processors participating in a collective call 
by a parallel application, the processor comprising: 

mean for determining a pattern of failures by the proces- 
30 sors to meet a plurality of deadlines; 

means for forming a local view as to which ones of the 
processors have failed, the local view, being based on 
the pattern; and 
means for exchanging the local view with the views of the 
55 . other processors, the exchanging means operating 
according to a collective consistency protocol such that 
if any two processors participating in the collective call 
return their views to the parallel application and one of 
the two processors is not regarded as failed in the view 
60 of the other processor, then the two views are identical, 
whereby failed ones of the processors are consistently 
detected and reported by the processors. 

25. The processor as recited in claim 24, wherein the 
collective consistency protocol is such that if any two 

65 processors participating in the collective call return their 
views to the parallel application, then the views returned are 
identical 
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26. The processor as recited in claim 24, wherein the 
collective consistency protocol is such that if any failure is 
detected locally by a second processor participating in the 
collective call and the second processor returns to the 
parallel application, then the failure is reported to the 
interface by the processor. 

27. The processor as recited in claim 24, wherein the 
interface includes a broadcast medium and the deadlines are 
for the processors to receive a plurality of packets from the 
broadcast medium, the packets being sent to the broadcast 
medium by the processors. 

28. The processor as recited in claim 27, wherein the 
means for determining includes means for associating with 
the pattern a numerical measure relative to a threshold for 
receiving the packets by the processors. 

29. The processor as recited in claim 28, wherein the 
numerical measure is a number of consecutive failures by 
the processors to meet a deadline. 

30. A distributed computing system having a plurality of 
processors participating in a collective call by a parallel 
application for status of the system, each processor com- 
prising: 

means for sending to the other processors of the system a 
view on the status of the processors, including status as 
to which of the other processors have failed; 

means for waiting until the processor receives views from 
all other processors except those regarded as failed in 
the view of the processor, the views being sent by 
execution of the step of sending; and 

means for returning the view of the processor to the 
parallel application if the received views are identical 
to the view of the processor, 

whereby collective consistency is achieved among the 
processors as to the status of the processors returned to 
the parallel application, collective consistency being a 
condition which holds whenever it is true that, if any 
two processors return their views to the parallel appli- 
cation and one of the two processors is not regarded as 
failed in the view of the other processor, then the two 
views are identical. 

31. A distributed computing system having a plurality of 
processors participating in a collective call by a parallel 
application each processor comprising: 

mean for determining a pattern of failures by the proces- 
sors to meet a plurality of deadlines; 

means for forming a local view as to which ones of the 
processors have failed, the local view being based on 
the pattern; and 

means for exchanging the local view with the views of the 
other processors, the exchanging means operating so 
according to a collective consistency protocol such that 
if any two processors return their views to the parallel 
application and one of the two processors is not 
regarded as failed in the Yiew of the other processor, 
then the two views are identical, whereby failed ones of 
the processors are consistently detected and reported by 
the processors. 

32. The distributed computing system as recited in claim 
31 further comprising means for reorganizing work to be 
performed by the distributed computing system among the 
processors not detected as failed. 

33. A computer program product for achieving collective 
consistency in detecting failures in a distributed computing 
system, the system having a plurality of processors partici- 
pating in a collective call by a parallel application for status 
on the processors, the computer program product compris- 
ing: 
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a recording medium; 

means, recorded on the recording medium, for instructing 
each of the processors to perform the steps of: 
sending to the other processors a view representing the 
status of the other processors, including status as to 
which of the other processors have failed; 
waiting until the processor receives the views from all 
other processors except those regarded as failed in 
the view of the processor, the views being sent by 
execution of the step of sending; and 
returning the view of the processor to the parallel 
application if the received views are identical to the 
view of the processor, 
whereby collective consistency is achieved among the 
processors as to the status of the processors returned to 
the parallel application, collective consistency being a 
condition which holds whenever it is true that, if any 
two processors return their views to the parallel appli- 
cation and one of the two processors is not regarded as 
failed in the view of the other processor, then the two 
views are identical 

34. The computer program product as recited in claim 33 
further comprising means, recorded on the recording 
medium, for instructing the processor to perform, immedi- 
ately before the step of returning if the received views are 
not identical to the view of the processor, the steps of: 

updating the view of the processor to be the same as a 
union of all the received views and the current view of 
the processor; and 

repeating the steps of the program product 

35. The computer program product as recited in claim 33 
further comprising means, recorded on the recording 
medium, for instructing the processor to perform, immedi- 
ately before the step of returning, the steps of: 

updating the view of the processor to be the same as a 
union of all the views received and the current view of 
the processor; 

determining whether the processor has information that 
each of the other processors not regarded as failed in 
the union forms a union identical to the union of the 
processor; and 

repeating the steps of the program product if the processor 
does not have the information. 

36. Hie computer program product as recited in claim 34 
further comprising means, recorded on the recording 
medium, for instructing the processor to perform, immedi- 
ately after the step of sending, the steps of: 

determining whether the processors not regarded as failed 
in the view of the processor form a quorum, the quorum 
being any element of a previously chosen plurality of 
subsets of a set, the plurality of subsets having a 
property that any two elements of the plurality of 
subsets intersect; and 

terminating the method without returning to the parallel 
application if the quorum is not formed. 

37. The computer program product as recited in claim 33, 
wherein 

the step of waiting is performed until: 

a group of the processors form a quorum, the quorum 

being any element of a previously chosen plurality of 

subsets of a set, the subsets having a property that 

any two of the subsets intersect; 
the views received from the processors in the quorum 

are identical; and 
each of the identical views contains the view of the 

processor, and 
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the computer program product further comprises 
means, recorded on the recording medium, for 
instructing the processor to perform, immediately 
before the step of returning, the steps of: 
updating the view of the processor to be the same as 

the identical views; and 
sending a DECIDE message to all other processors, 

the message including the updated view of the 

processor. 

38. The computer program product as recited in claim 33, 
wherein 

the step of waiting is performed until: 
the processor receives a DECIDE message from one of 

the other processors; and 
the message has a view which includes the current view 

of the processor; and 
the computer program product further comprises 
means, recorded on the recording medium, for 
instructing the processor to perform, immediately 
^ after the step of waiting, the steps of: 
updating the view of the processor to be the same as 

the view of the message; and 
sending a DECIDE message to all other processors, 
the message including the updated view of the 
processor. 

39. The computer program product as recited in claim 33, 
wherein 

the step of waiting is performed until: 

the processor has information of a second view that 
contains the view of the processor, and 

the second view is the only view that could possibly be 
held by a quorum, the quorum being any element of 
a previously chosen plurality of subsets of a set, the 
plurality of subsets having a property that any two 
elements of the plurality of subsets intersect; and 

the computer program product further comprises 
means, recorded on the recording medium, for 
instructing the processor to perform, immediately 
after the step of waiting, the steps of: 
updating the view of processor to be the same as the 

second view; and 
repeating the steps of the program product 

40. The computer program product as recited in claim 33, 
wherein 

the step of waiting is performed until: 

the processor has information that none of the other 
processors received identical views from a group of 
the processors, the group forming a quorum, the 
quorum being any element of a previously chosen 
plurality of subsets of a set, the subsets having, a 
property that any two of the subsets intersect; and 

the computer program product further comprises 
means, recorded on the recording medium, for 
instructing the processor to perform, immediately 
after the step of waiting, the steps of: 
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updating the view of the processor to be the same as 
a union of the views received and the current view 
of the processor, and 
repeating the steps of the program product 
5 4L A computer program product for detecting and report- 
ing failures in a distributed computing system, the system 
having a plurality of processors participating in a collective 
call by a parallel application, the computer program product 
comprising: 
l0 a recording medium; 

means, recorded on the recording medium, for instructing 
each of the processors to perform the steps of: 
determining a pattern of failures by the processors to 

meet a plurality of deadlines; 
forming a local view as to which ones of the processors 
have failed, the local view being based on the 
pattern; and 

exchanging the local view with the views of the other 
processors, the exchange being based on a collective 
consistency protocol such that if any two processors 
20 return their views to the parallel application and one 

of the two processors is not regarded as failed in the 
view of the other processor, then the two views are 
identical, whereby failed ones of the processors are 
consistently detected and reported by the processors. 
25 42. The computer program product as recited in claim 41, 
wherein the collective consistency protocol is such that if 
any two processors participating in the collective call return 
their views to the parallel application, then the two views 
returned are identical. 
30 43. The computer program product as recited in claim 41, 
wherein the collective consistency protocol is such that if 
any failure is detected locally by a processor participating in 
the collective call and the processor returns to the parallel 
application, then the failure is reported to the interface. 
35 44. The computer program product as recited in claim 41, 
wherein the interface includes a broadcast medium and the 
deadlines are for the processors to receive a plurality of 
packets from the broadcast medium, the packets being sent 
to the broadcast medium by the processors. 
40 45. The computer program product as recited in claim 44, 
wherein the means for instructing the processor to perform 
the step of determining includes means, recorded on the 
recording medium, for instructing the processor to perform 
the step of associating with each pattern a numerical mea- 
45 sure relative to a threshold for receiving the packets by the 
processors. 

46. The computer program product as recited in claim 45, 
wherein the numerical measure is a number of consecutive 
failures by a processor to meet a deadline. 
50 47. The computer program product as recited in claim 41 
further comprising means, recorded on the recording 
medium, for instructing the distributed computing system to 
reorganize work to be performed by the distributed system 
among the processors not detected as failed. 

***** 



11/2/2005, EAST Version: 2.0.1.4 



d2) United States Patent 

Sato et al. 



iiiiniiiiinwiiininui 

US006279104B1 

(io) Patent No.: US 6,279,104 Bl 
(45) Date of Patent: *Aug. 21, 2001 



(54) DEBUGGING SYSTEM FOR PARALLEL 

PROCESSED PROGRAM AND DEBUGGING 
METHOD THEREOF 

(75) Inventors: Yuji Sato; Norihisa Murayama, both 
of Shizuoka (JP) 

(73) Assignee: Fujitsu Limited, Kawasaki (JP) 

( * ) Notice: This patent issued on a continued pros- 
ecution application filed under 37 CFR 
1.53(d), and is subject to the twenty year 
patent term provisions of 35 U.S.C. 
154(a)(2). 

Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 0 days. 

(21) Appl. No.: 09/047,952 

(22) Filed: Mar. 26, 1998 

(30) Foreign Application Priority Data 

Jul. 22, 1997 (JP) 9-196237 

(51) Int. CI. 7 G06F 11/00 

(52) U.S. CI 712/227; 712/228; 712/229 



(58) Field of Search 395/704; 345/440; 

709/102, 300; 712/227, 228, 229 

(56) References Cited 

U.S. PATENT DOCUMENTS 

5,179,702 * 1/1993 Spix et al 709/102 

5,325,530 * 6/1994 Mohrmann 395/704 

5,640,500 * 6/1997 Taylor 345/440 

5,687,375 * 11/1997 Schwiegclshohn 395/704 

5,745,760 * 4/1998 Kawamura et al 709/300 

5,963,746 ♦ 10/1999 Barker et al 712/20 

* cited by examiner 

Primary Examiner — Reba I. Elmore 

(74) Attorney, Agent, or Firm— Staas & Halsey LLP 

(57) ABSTRACT 

A debugging system for use with a data parallel processing 
apparatus is disclosed. Sequential debuggers debug a plu- 
rality of parallel processes. The processed result is output as 
reply information to a management processor. The manage- 
ment processor knows the reason why it has received reply 
information and manages debugging statuses of the indi- 
vidual sequential debuggers corresponding to the reply 
information against the debug command. 

35 Claims, 25 Drawing Sheets 



USER 



18 



MANAGEMENT PROCESS 



SOCKET 



20-1' 



SERVER PROCESS 




SERVER PROCESS 


21-b 


PIPE 




P,PE ;21-2 


SEQUENTIAL DEBUGGER 




SEQUENTIAL DEBUGGER 


22-1) 






r 22-2 


PARALLEL PROCESS 




PARALLEL PROCESS 



20-2 

\J 



11/2/2005, EAST Version: 2.0.1.4 



US 6,2' 

1 

DEBUGGING SYSTEM FOR PARALLEL 
PROCESSED PROGRAM AND DEBUGGING 
METHOD THEREOF 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a data parallel processing 
apparatus for processing data with a plurality of processor 
elements. More particularly, the present invention relates to 
a debugging system for a parallel processed program that 
drives the processing apparatus and a debugging method 
thereof. 

2. Description of the Related Art 

In a scientific and engineering parallel computer that 
repeats the same calculating process or the calculations with 
varied parameters, a data parallel process of which the 
calculating process is divided by a plurality of processors is 
performed. In the data parallel processing apparatus, since 
the same calculation program is executed by a plurality of 
processor elements in parallel, they perform respective cal- 
culating processes with respective data and variables so as to 
accomplish a high speed calculating process. 

On the other hand, in the final stage of software 
development, a debugging process for checking out source 
code, executable format, and so forth and for finding errors 
of a program and of data, and variables, in the program is 
essential. Conventionally, the debugging process is per- 
formed by, for example, the following debugger. 

The debugger debugs programs of individual processor 
units. This type of debugger is referred to as a sequential 
debugger. The sequential debugger debugs only a process 
program in one processor element. The debugger tracks the 
operation of a program of each processor element, stops for 
example a source program at a particular line, then checks 
out data and content of variables. 

However, when a program for the data parallel processing 
apparatus with a plurality of processor elements connected 
in parallel is debugged, a complicated debugging process 
should be performed for each processor element. Thus, the 
debugging process is ineffective and thereby takes a long 
time. 

SUMMARY OF THE INVENTION 

An object of the present invention is to provide a debug- 
ging system for effectively performing a debugging process 
for a parallel processed program of a data parallel processing 
apparatus, and thereby largely contributing supporting pro- 
gram development as an effective debugging tool of a 
parallel processed program and a debugging method thereof. 

The present invention is accomplished by a debugging 
system for use with a data parallel processing apparatus 
having a plurality of processor element for processing the 
same process in parallel, the debugging system comprising 
a plurality of sequential debuggers for debugging process 
programs of the processor elements, and a management 
processor for outputting a debug command to the sequential 
debuggers, causing the sequential debuggers to debug the 
process programs, receiving reply information therefrom as 
the debugged results, and managing a debugging process for 
the data parallel processing apparatus. 

The plurality of processor elements (sometimes referred 
to as PEs) process the same program in parallel. For 
example, the processor elements process a parallel program 
in the scientific and engineering calculating process. Each 
processor element can access a shared memory shared by 
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each processor element along with a dedicated local memory 
therof. For example, each processor can access the shared 
memory along with a relevant dedicated local memory. 
Each sequential debugger debugs only a program process 

5 of a relevant processor element. The sequential debugger 
performs a debugging process corresponding to a debug 
command that is received from the management processor. 
Examples of the debug command are "PRINT" command, 
"BREAK" command, "CONTINUE" command, "STEP" 

10 command, and "P.STAT' command. 

In addition, the management processor outputs a debug 
command to each processor element. The sequential debug- 
ger of each processor element debugs the process program 
corresponding to each processor element. The management 

35 processor receives reply information from each processor 
element as a result of the debugging process for data of the 
relevant processor element. 
Thus, the management processor outputs various debug 

2Q commands to the sequential debuggers of the individual 
processor elements and receives reply information from the 
sequential debuggers. Consequently, the management pro- 
cessor can obtain the statuses of the programs of all the 
processor elements and effectively manage the debugging 

25 process. Thus, the management processor can effectively 
perform the debugging process. 

These and other objects, features and advantages of the 
present invention will become more apparent in fight of the 
following detailed description of a best mode embodiment 

30 thereof, as illustrated in the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic diagram showing the structure of the 
data parallel processing apparatus for which the debugging 
35 system according to the present invention is applied; 

FIG. 2 is a schematic diagram for explaining a manage- 
ment processor; 

FIG. 3 is a block diagram showing the structure of a 
debugging system according to the present invention, the 
debugging system being applied for the data parallel pro- 
cessing apparatus; 

FIG. 4 is a schematic diagram showing the structure of 
processor elements including sequential debuggers and par- 
45 allel processors; 

FIG. 5 is a schematic diagram showing display examples 
of windows displayed on a display of the management 
processor; 

FIG. 6 is a schematic diagram showing a display structure 
50 in the case that only source programs are displayed on the 
display; 

FIG. 7 is a flow chart showing a process of the manage- 
ment processor; 

FIG. 8 is a flow chart for explaining a debug command 
55 sending process performed by the management processor; 

FIG. 9 is a flow chart for explaining a process performed 
by a sequential debugger; 

FIG. 10 is a flow chart for explaining a reply information 
6Q receiving process performed by the management processor; 

FIG. 11 is a flow chart showing a transmission function 
processing program; 

FIG. 12 is a flow chart for explaining addition of identi- 
fication information performed by a server processor; 
65 FIGS. 13 A and 13B are schematic diagrams showing data 
in a status display portion after a "PRINT* command is 
output; 
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"breaks" continue as shown in FIG. 21, one current process 
is designated and thereby only "1: break" may be displayed. 
Thus, the other "breaks" may be suppressed from being 
displayed. However, in the example shown in FIG. 21, the 
process should be performed after reply information of 5 
"breaks" of all the sequential debuggers has been confirmed. 

In the above-described embodiment, four debuggers were 
used as sequential debuggers. However, as the number of 
parallel processes of the parallel processing apparatus 
increases, the number of sequential debuggers increases. 10 

FIG. 25 is a schematic diagram showing a storing medium 
that stores a program with which the embodiment is accom- 
plished. 

As described above, according to the present invention, 15 
the following effects can be accomplished. 

When parallel processed programs of a data parallel 
processing apparatus are debugged, since the user can use a 
management processor that manages each sequential 
debugger, the debugging process can be very effectively 20 
performed. 

In addition, corresponding to the function of a debug 
command, since a plurality of processes can be performed at 
a time, the debugging process can be very quickly per- 
formed. 25 

Although the present invention has been shown and 
described with respect to a best mode embodiment thereof, 
it should be understood by those skilled in the art that the 
foregoing and various other changes, omissions, and addi- 30 
tions in the form and detail thereof may be made therein 
without departing from the spirit and scope of the present 
invention. 

What is claimed is: 

1. A debugging system for use with a data parallel 35 
processing apparatus having a plurality of processor ele- 
ments for processing the same process in parallel, the 
debugging system comprising: 

a plurality of sequential debuggers to debug process 
programs of the processor elements, wherein each 40 
processor element has a local memory to store data 
peculiar to each said processor element and a common 
memory to store data common to all processor elements 
such that during execution of the respective debug 45 
process programs, each processor element outputs 
reply information in response to receipt of a debug 
command; and 

a management processor to output the debug command to 
said sequential debuggers, to cause said sequential 
debuggers to debug the process programs, to receive 
the reply information therefrom as debugged results, 
and to add identification to the reply information rep- 
resenting that the reply information from each respec- 55 
tive sequential debugger has been received. 

2. The debugging system as set forth in claim 1, further 
comprising: 

a server processor to control a corresponding sequential 
debugger at an instruction of said management 60 
processor, wherein said management processor outputs 
the debug command to said sequential debuggers 
through said server processor and to receive reply 
information from said sequential debuggers through the 65 
server processor, data between said management pro- 
cessor and the server processor being communicated on 
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a socket communication basis, data between the server 
processor and said sequential debuggers being commu- 
nicated on a pipe communication basis. 

3. The debugging system as set forth in claim 1, 
wherein said management processor does not output the 

next debug command until said management processor 
receives reply information from all of said sequential 
debuggers. 

4. The debugging system as set forth in claim 1, 
wherein said management processor displays a status 

when reply information is not received from all of said 
sequential debuggers. 

5. The debugging system as set forth in claim 1, 
wherein the debug command uses a transmission function 

contained in a source program using variables stored in 
global memory. 

6. The debugging system as set forth in claim 1, 
wherein the debug command is a status display command. 

7. The debugging system as set forth in claim 1, 
wherein the debug command is output to a selected one of 
, said sequential debuggers. 

8. The debugging system as set forth in claim 7, 
wherein said management processor selects said sequen- 
tial debuggers. 

9. The debugging system as set forth in claim 1, 
wherein the debug command is output to all of said 

sequential debuggers. 

10. Hie debugging system as set forth in claim 9, 
wherein said management processor selects said sequen- 
tial debuggers. 

11. The debugging system as set forth in claim 1, 
wherein the reply information is displayed in a basic 

window of a display of said management processor. 

12. The debugging system as set forth in claim 11, 
wherein a process window for displaying the same source 

program of the processor elements or displaying dif- 
ferent source programs of the processor element in 
parallel is generated on the display. 

13. The debugging system as set forth in claim 1, 
wherein the status display is displayed in a basic window 

of a display of said management processor. 

14. The debugging system as set forth in claim 13, 
wherein a process window for displaying the same source 

program of the processor elements or displaying dif- 
ferent source programs of the processor element in 
parallel is generated on the display. 

15. The debugging system as set forth in claim 1, 
wherein the reply information for each of said sequential 

debuggers is grouped and displayed. 

16. The debugging system as set forth in claim 15, 
wherein the grouped reply information is displayed in an 

order of priority previously specified. 

17. The debugging system as set forth in claim 15, 
wherein the reply information is not displayed when the 

reply information is redundant information. 

18. A debugging management processor for use with a 
sequential debugger for a data parallel processing apparatus 
having a plurality of processor elements for processing the 
same process in parallel, wherein each processor element 
has a local memory to store data peculiar to each said 
processor element and a common memory to store data 
common to all processor elements, the debugging manage- 
ment processor comprising means for outputting a debug 
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command to said a sequential debugger for debugging 
process programs of the processor elements, means for 
causing the sequential debugger to debug the process 
programs, means for receiving reply information as the 
debugged result from the sequential debugger, and means 5 
for managing a debugging process of the data parallel 
processing apparatus by adding identification to the reply 
information representing that the reply information from 
each respective sequential debugger has been received. 

19. A readable storage medium for storing a program for 
causing a computer to perform: 

(a) outputting a debug command to a process element for 
processing the same parallel program, wherein the 
process element has a local memory to store data 15 
peculiar to the process element and a common memory 

to store data common to a plurality of process elements; 
and 

(b) receiving reply information of which a data parallel 
processed program of the process element has been 20 
debugged corresponding to the debug command that 
has been output by the operation (a) and managing a 
debugging process of the data parallel processed pro- 
gram by adding identification to the reply information 25 
representing that the reply information from the process 
element has been received. 

20. A debugging method of a data parallel processed 
program of which a plurality of process elements process the 
same processes in parallel, comprising: 30 

(a) outputting a debug command to the process elements, 
wherein each process element has a local memory to 
store data peculiar to each said process element and a 
common memory to store data common to all process 
elements; 35 

(b) debugging data parallel processed programs of the 
process elements corresponding to the debug command 
that has been output at the operation (a); and 

(c) receiving reply information as a result at the operation 
(b) and managing a debugging process of the data 40 
parallel processed program by adding identification to 
the reply information representing that the reply infor- 
mation from each respective process element has been 
received. 

21. The debugging method as set forth in claim 20, 45 
wherein the debug command is a status display command. 

22. The debugging method as set forth in claim 21, 
wherein the debug command uses a transmission function 

contained in a source program. 

23. The debugging method as set forth in claim 20, 
wherein the operation (a) and the receiving of the opera- 
tion (c) are performed corresponding to a server pro- 
cess. 

24. The debugging method as set forth in claim 23, 55 
wherein the debug command is a status display command. 

25. The debugging method as set forth in claim 24, 
wherein the reply information is displayed in a basic 

window of a display of the management processor. 

26. The debugging method as set forth in claim 24, 60 
wherein the status display is displayed in a basic window 

of a display of the management processor. 

27. The debugging method as set forth in claim 26, 
wherein a process window for displaying the same source 

program of the processor elements or displaying dif- 65 
ferent source programs of the processor element in 
parallel is generated on the displayprocessor. 
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28. The debugging method as set forth in claim 23, 
wherein the debug command uses a transmission function 

contained in a source program. 

29. The debugging method as set forth in claim 28, 
wherein the reply information is displayed in a basic 

window of a display of the management. 

30. The debugging method as set forth in claim 28, 
wherein the status display is displayed in a basic window 

of a display of the management processor. 

31. The debugging method as set forth in claim 30, 
wherein a process window for displaying the same source 

program of the processor elements or displaying dif- 
ferent source programs of the processor element in 
parallel is generated on the display. 

32. A debugging system for use with a data parallel 
processing apparatus having a plurality of processor ele- 
ments for processing a program in parallel, the debugging 
system comprising: 

a plurality of sequential debuggers to debug process 
programs of the processor elements, wherein each 
processor element has a local memory to store data 
peculiar to each said processor element and a common 
memory to store data common to all processor ele- 
ments; 

a management processor to output a debug command to 
said sequential debuggers, causing said sequential 
debuggers to debug the process programs, to receive 
reply information therefrom as the debugged results, 
and to manage a debugging process for the data parallel 
processing apparatus by adding identification to the 
reply information representing that the reply informa- 
tion from each respective sequential debugger has been 
received; and 

a window, formed on a display of said management 
process, to display the same source program or different 
source programs of the processor elements. 

33. A readable storage medium for storing a program for 
causing a computer to perform: 

(a) outputting a debug command to a plurality of process 
elements for processing the same parallel program, 
wherein each process element has a local memory to 
store data peculiar to each said process element and a 
common memory to store data common to all process 
elements; and 

(b) receiving reply information from a sequential debug- 
ger to debug data parallel processed programs of the 
process elements corresponding to the debug command 
that has been output at the operation (a) and to manage 
a debugging process of the process programs by adding 
identification to the reply information representing that 
the reply information from each respective processing 
element has been received. 

34. A debugging system for use with a data parallel 
processing apparatus having a plurality of processor ele- 
ments for processing the same process in parallel, the 
debugging system comprising: 

a plurality of sequential debuggers to debug process 
programs of the processor elements, wherein each 
processor element transmits reply information in 
response to receipt of a debug command such that 
during execution of the respective debug process pro- 
grams each processor adds identification information to 
the reply information indicating that the reply informa- 
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tion from each respective sequential debugger has been 
received; and 

a management processor to output the debug command to 
said sequential debuggers, to cause said sequential 
debuggers to debug the process programs, and to 5 
receive the reply information therefrom as debugged 
results. 

35. A debugging system for use with a data parallel 
processing apparatus having a plurality of processor ele- 
ments for processing the same process in parallel, the 10 
debugging system comprising: 

a plurality of sequential debuggers to debug process 
programs of the processor elements, wherein each 
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processor element transmits reply information in 
response to receipt of a debug command; 

a management processor to output the debug command to 
said sequential debuggers, to cause said sequential 
debuggers to debug the process programs; and 

a server processor to control a corresponding sequential 
debugger in response to an instruction of said manage- 
ment process such that the server processor adds iden- 
tification information to the reply information indicat- 
ing that the reply information from each respective 
sequential debugger has been received. 

* * * * * 
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[57] ABSTRACT 

Two microprogrammed processors of identical hard- 
ware and software facilities are synchronously operated 
for exchange operations with a plurality of external 
store and peripheral units. The exchange connections 
from the processors to the external units pass through 
gate circuits one of which only is operative at a time. 
Each processor has an output at which all processed 
words, whether instructions, microinstructions, oper- 
ands or results are successively available. Comparator 
means permanently compare such words from the two 
processors and, when a mismatch occurs, block the 
gate circuits and their own comparison outputs and 
control execution in both processors of a failure detec- 
tion and check up routine. Each processor includes 
exchange control claiming and exchange control dis- 
claiming outputs. When a processor activates its ex- 
change control disclaiming output, its gate means are 
blocked, the output of the comparator means is inhib- 
ited and the other processor is automatically qualified 
to proceed with the exchanges. Casually, the proces- 
sors are then desynchronized. 

7 Claims, 1 Drawing Figure 
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ands and instruction and microinstruction codes during 
a task of the processors. Consequently, in such an orga- 
nization of the system as herein above described, a very 
early detection of failure will be ensured. If, further, 
some of the circuits such as 10 to 12 and 14 or 60 to 62 5 
and 64 itself fails, the detection of the failure in the 
processor which it normally ensures, will be, just later, 
be detected by the comparator means proper. 

As already described too, when both comparators 
issue a mismatch signal, a check up routine is initiated 10 
in each processor, said check up routine beginning by a 
task interrupt microprogram for preserving the context 
of the interrupted task and continuing by a systematic 
check up of all the functional components of the pro- 
cessors. Normally, at the end of such a check up rou- 15 
tine, one of the processors must have already activated 
its control disclaiming output, 35 for 1 or 85 for 51, 
thus controlling inhibition of the synchronization in the 
time bases 5 and 55 and inhibition of the comparator 
means (more definitely continuation of such an inhibi- 20 
tion of the comparator means); when 35 is activated, it 
further inhibits the gate means 9 and applies to the 
input 91 of the two condition member 30 a signal re- 
questing the switching of the condition of said member, 
if necessary for giving the exchange control to the other 25 
processor; when 85 is activated, it also inhibits the gate 
means 59 and applies to the input 41 of 30 a signal 
requesting the switching of said member to a condition 
giving the exchange control to processor 1. 

Thereafter, normally, the other processor must acti- 30 
vates its exchange control claiming output, 36 for 1, 86 
for 51, and the execution of the executed task is rein- 
stated and ensured by the said processor. 

Though of doubtful possibility, it may be that, once 
the check up routines executed, both processors still 35 
claim the control of exchanges with the peripheral and 
external store units. A more elaborated program of test 
must be requested and executed for deciding which 
processor is actually the subject of a failure. This may 
be ensured as follows:- an AND-circuit 50 is connected 40 
across the outputs 36 and 86 of the processors and it 
must be understood that said AND-circuit is only un- 
blocked from the acquit instruction of a chek up rou- 
tine and is normally inoperative, or is unblocked from 
the simultaneous activation of the inputs 34 and 84 of 45 
the comparators:- for instance, 50 may include a bista- 
ble member set to work at simultaneous activations of 
34 and 84, or of 36 and 86, reset to rest when one of 
such activations disappers. The output of 50 is con- 
nected to respective test program inputs of the proces- 50 
sors, and consequently such a program of test is re- 
quested each time the AND-circuit 50 activates its 
output. It may be noted that such a program of test may 
have recourse to an external unit since the connections 
6 and 56 from the external units to the processors are 55 
never blocked and one at least of the gate means 9 and 
59 is unblocked (or else, the request of such a program 
from an external unit may be directly provided by the 
activation of the output of 50). Such a program will 
check the processors for execution of typical instruc- 60 
tions and, normally again, one of the processors must 
have activated its control disclaiming output before the 
last instruction of said programme. In the utmost im- 
probable case such a program could not obtain such a 
result, the task will be continued with "random" attri- 65 
button of the exchange control to one of the processors 
as herein above explained for defining the initial condi- 
tion of the two-condition member 30. 



As apparent, once the comparators inhibited, they 
remain inhibited until the interrupted task is completed 
and, for complete security in this respect, each inhibit- 
ing input of the comparators may be made a memory 
preserving one (a more bistable member actuated from 
activation of such an inhibiting input will suffice in this 
respect). 

What is claimed is: 

1. A data processing system comprising the combina- 
tion of two microprogram operated processors of iden- 
tical hardware and software facilities and identical 
coupling connections to a plurality of external store 
and peripheral units, gate means in each of the connec- 
tions outputting from the processors to said external 
units, two-condition circuit means for controlling said 
gate means in reverse conditions of conduction, means 
synchronizing the operations of the processors during 
execution of a task, outputs in said processors at which 
the processed data are simultaneously available in their 
sequence of occurrence, comparator means having 
inputs connected to said processor outputs and an out- 
put responsive to a mismatch between the inputting 
data thereof connected to inhibiting inputs of the said 
gate means and to respective interrupt inputs of the 
processors initiating simultaneous check up routine 
therein and further outputs of the processors to the said 
two-condition circuit means individually responsive to 
the result of the completion of such a check up routine 
in each processor for controlling the said two-condition 
circuit means and masking the output of the said com- 
parator means for the completion of the interrupted 
task after completion of the said routine. 

2. A data processing system according to claim 1, 
wherein each processor further comprises local failure 
detecting circuitry and a task interrupt terminal re- 
sponsive to the activation of said circuitry for disabling 
said synchronizing means and said comparator means 
and controlling the condition of the said two-condition 
circuit means for blocking the gate means of the pro- 
cessor wherein a failure identification routine is initi- 
ated. 

3. A data processing system comprising a central 
unit, a plurality of external store and peripheral units, a 
common inputting channel from said external units to 
said central unit and a common outputting channel 
from said central unit to said external units, wherein: 

said central unit comprises first and second simulta- 
neously operating, time-base synchronized micro- 
programmed processors of identical hardware and 
software organizations, 

each processor having a data output connection to 
the said common outputting channel through data 
transfer gate means, 

each processor further having a bus output on which 
all the codes of the words involved in the execution 
of any task successively appear, 

each processor further having a first task interrupt 
input the activation of which initiates a checkup 
routine in the processor, and first and second 
checkup result responsive outputs, one of which is 
activated for a positive result of the checkup and 
the other of which is activated for a negative result 
of the checkup, 

a two-condition member having distinct actuation 
inputs and a pair of complementary condition out- 
puts respectively controlling in reciprocal transfer 
conditions the data transfer gate means of said first 
and second processors, one of the said actuation 
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inputs being connected to the first checkup respon- 
sive output of the first processor and to the second 
checkup responsive output of the second processor 
and the other one of the said actuation inputs being 
connected to the second checkup responsive out- 
put of the first processor and to the first checkup 
responsive output of the second processor, and, 
code comparator means having respective inputs 
connected to the bus outputs of the first and sec- 
ond processors and a mismatch output connected 
to both the said first task interrupt inputs of the 
processors* to inhibiting inputs of both the said 
data transfer gates and to a code comparator 
means self-inhibiting input. 
4. A data processing system according to claim 3, 
wherein said code comparator means comprise first 
and second code comparator circuits each having in- 
puts connected to the bus outputs of the processors, 
each having its output connected to its own self-inhibit- 
ing input through a gate connection, the first code 
comparator circuit having said gated output connected 
to the first task interrupt input of the first processor and 
to the inhibiting input of the data transfer gate means of 
said first processor, the second code comparator circuit 
having its gated output connected to the first task inter- 
rupt input of the second processor and to the inhibiting 
input of the data transfer gate of said second processor, 
and wherein said first and second code comparator 



circuits have outputs connected to respective inputs of 
an excIusive-OR circuit the output of which is con- 
nected to inhibit inputs of the said gated outputs on the 
occurrence of a mismatch condition between the inputs 
5 of the said exclusive-OR circuit. 

5. A data processing system according to claim 3, 
wherein each of the said first and second processors 
includes a local failure detecting orgaination, the out- 
put of which is connected to a second task interrupt 

10 input the activation of which initiates a failure localiza- 
tion routine in the processor, said second task interrupt 
input being connected to an inhibit input of the data 
transfer means. of said processor, to the inhibit input of 
the said code comparator means and to the actuation 
IS input of the said two-condition member which is con- 
nected to the second checkup responsive output of the 
processor. 

6. A data processing system according to claim 5, 
wherein both second checkup responsive outputs and 

20 both second task interrupt inputs of the processors are 
further connected to a time-base synchronizing inhibit- 
ing input of the time bases of the processors. 

7. A data processing system according to claim 3, 
wherein a diagnostic routine control circuit is con- 

25 nected across the said first checkup responsive outputs 
of the processors and has its output connected to diag- 
nostic routine initiating inputs of the said processors. 
***** 
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[57] ABSTRACT 

A method provides error protection in a multiprocessor 
central control unit of a switching system wherein a 
number of central processors (CP, IOC) as well as a 
central memory (CMY) are connected in parallel to a 
central bus system (B:CMY0/B:CMY1). The proces- 
sors include dual highly-synchronous parallel driven 
processor units (PU) —apart from a possible tolerable 
positive timing slip — and integral error detection cir- 
cuits (V), as well as an integral local memory (LMY), in 
the ROM-area of which test program sections are 
stored for testing the respective processors (CP, IOC). 
Upon the detection of an error by at least one of the 
error detection circuits (V) of a processor (for example 
CPx), in the respective processor (CPx), at least if the 
error is not immediately correctable, the error detection 
circuit (V in CPx) starts isolating the respective proces- 
sor (CPx) from the bus system (B:CMY). The respective 
processor (CPx) starts the read-out of the test program 
sections, stored in its own local memory (LMY), for 
localizing and identifying the error source and/or the 
defect causing such errors. 

10 Claims, 1 Drawing Sheet 
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pies and general characteristics are previously known availability is restored to provide a minimum of down- 

and described in detail in these prior patent applications, time. 

it is unnecessary to describe the construction and opera- The subsequent failure diagnosis of the defective 

tion of the present central control unit in complete processor is also especially rapidly and simply possible 

detail again. Instead it is adequate to discuss only upon 5 if, in the event that an indication of an error/defect is 

the particular matters and techniques as they relate in established during test processing, an error code corre* 

accordance with the present invention. spending to this error/defect for example, the address 

DETAILED DESCRIPTION °^ ^ re ^ tec * test program section command, is stored 

in a diagnostic register REG of the respective processor 
The sole FIGURE includes central processors desig- 10 CPx. 

nated CPO . . . CPU, IOCO, IOC1 . . . , with integrated when an error/defect has been adequately localized 

redundant processor building blocks, PU, e.g. with ^d/or typified, it is even possible, immediately after 

integrated double 32-bit microprocessor chips PU. storing the error code, to interrupt processing of the test 

These processor units PU are driven in synchronous program 

step in parallel, apart from a possible tolerable timing 15 If ^ defective processor CPx is simply left isolated 

slip, and therefore carry out the actual error protected from ^ bus % B:CMY it may immediately be 

mterconnects j^or when purpose they contain, or have b ^ durin ^ nonnal distributioil of switching 

been associated with, at least one mdependen error ^ that ^ does not resuU in a lo » 

de^t,on circu.t,e. g . EDCc^ of ^ for ^ ^ &wi 

and/or comparison circuits V, parUcukrly for the im- 20 effect f ^ Qn th ^ f ^ ■ 

mediate checking of instructions which includes com- ^ remains limite ^ 

mands and/or data being processed by both processor M . _ ./„ r , , r , , . 

units PU, of the respective processor. Each of the pro- ™ e rem0te diagnosis f aU errors/defects and the 
cessors CP, IOC contains individual local memory <*"«P<>nd«g appropriate preparation for the subse- 
LMY, LMY:IO, with a RAM section and in particular 25 quent * thc <**ral control unit is possible, by 
a ROM section, i.e. a PROM section, which store at means of a P™«f r * operation and main- 
least partially similar error diagnostic programs namely * ***** ^TJ^J^^^ 0 , the 
test program sections for self testing the respective bus systera B:CMY0/B:CMY1 at least to each respec- 
processor, and which store switching program sections, "Yf Pro^r CPx, although this respective processor 
in particular the most frequent and/or most quickly 30 CP^ ^ated from the bus system B:CMY0/B:CMY1, 
needed switching program sections required by the for interrogating the contents of its diagnostic register 
respective processor CP, IOC. REG via a s P ecial interrogating code. 

The central memory CMY, to which the central In the event that a direct indication of the error/- 
processors CP, IOC have access over the dual bus sys- defect of onl y °ne of the two processor units PU oc- 
tem B:CMY0/B:CMY1, store at least various seldom 35 curred > e S- b y associated EDC circuit or an associ- 
and/or not immediately needed switching program ated parity bit network of this respective processor unit 
sections required by a central processor CP, IOC, as this respective processor unit PU alone, may be 
well as data — accessible at least temporarily— for sev- isolated and normal processor operation may be main- 
era! or for all processors CP, IOC which data concern via the other processing unit PU of this respec- 
a multiplicity of connections existing at that time, and 40 f ivc processor, thus leaving the central control unit in 
concern peripheral system device features. lts previous high state of availability. It is also desirable, 

As soon as an error is detected by at least one of the during that isolation of the single processor unit PU, to 

error detection circuits, for instance, by one of the pro- *°8 an indication of this condition for the maintenance 

cessor comparison circuits V in the respective proces- service staff, for example, again in the diagnostic regis- 

sor, for example in CPx, at least if this error is not easily 45 ler which may even be remotely interrogated by 

correctable, the output signal of the respective error the operation and maintenance station. The repair of 

detection circuit, starts a process to isolate the respec- tnis processor may then be included in preparing proce- 

tive processor CPx from the bus system B:CMY0/B> dures for the subsequent maintenance. 

CMY1, e.g. by means of an I/O unit, and starts the There has thus been shown and described a novel 

read-out of the respective test program section for such 50 arrangement and technique for providing error protec- 

trouble, stored in the ROM-part of the local memory tion in multiprocessor central control unit of a switch- 

LMY (and/or LMY:IO). Thereupon the two processor ing system which fulfills all the objects and advantages 

units PU of this processor CPx begin to process this test sought therefore. Many changes, modifications, varia- 

program for the more specific location and/or typfica- tions and other uses and applications of the subject 

tion of the respective error or of the defect which 55 invention will, however, become apparent to those 

causes such errors. Through the isolation of the respec- skilled in the art after considering this specification and 

tive processor CPx, the propagation of the error the accompanying drawing which disclose the pre- 

through the entire switching system is avoided. In other ferred embodiments thereof. All such changes, modifi- 

words, a high tolerance for errors is achieved. The cations, variations and other uses and applications 

immediately introduced self testing simplifies the subse- 60 which do not depart from the spirit and scope of the 

quent error diagnosis for maintenance service staff, and invention are deemed to be covered by the invention 

the possible repair necessary is very specific, and in which is limited only by the claims which follow, 

general limited to a small portion or section of the re- We claim: 

spective processor. In the event that no indication of an 1. A method of operating an error protected high 
error/defect is established during test processing, the 65 availability multiprocessor serving as a central control- 
respective processor CPx reconnects itself again, pref- ler of a switching system providing inter-connections to 
erably on its own, with the bus system B:CMY0/B:- subscribers, particularly a telephone switching system, 
CMY1, so that the original extremely high operational the central controller comprising: 
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(a) a plurality of central processors (CP, IOC), each 
central processor including dual, apart from a pos- 
sible tolerable timing slip, parallel synchronously 
driven processor units (PU) for carrying out the 
inter-connections to subscribers connected to said 
switching system, including at least one integral 
error detection circuit (V) for immediately check- 
ing instructions processed by both of the dual pro- 
cessor units (PU) of the respective processor, and 
including a local memory (LMY, LMY:I0) having 
a ROM-section, storing test program sections for 
self testing of the respective processor (CP, IOC), 
and storing switching program sections required 
most frequently and most quickly by the respective 
processor (CP, IOC), 

(b) a central main memory (CMY) including a ROM- 
area storing at least seldom and not immediately 
required (CP, IOC) switching program sections, 
and including a memory-area storing at least tem- 
porarily data accessible for a number of or for all 20 
processors (CP, IOC), whereby such data concern 
inter-connections between subscribers and concern 
peripheral system elements, and 

(c) a central bus system (B:CMY) to which are con- 
nected in parallel the processors (CP, IOC) and the 25 
main memory (CMY), after detecting an error by at 
least one of the error detection circuits (V) or a 
processor (CPx, for example), at least if this error is 
not immediately correctable, the method compris- 
ing the steps: 

(a) isolating the respective processor (CPx) from 
the bus system (B:CMY); 

(b) starting to read-out test program sections stored 
in ROM-section of its own local memory (LMY, 
LMY:IO) by the respective processor (CPx); and 

(c) processing this test program for localization and 
identification of the error and a defect causing 
such errors by the respective processor (CPx). 

2. A method in accordance with claim 1, further 
comprising the step: 

(d) reconnecting the respective processor (CPx) to 
the bus system (B:CMY) if no indication of an error 
and/or defect is detected during the processing of 
the test program. 

3. A method in accordance with claim 1, further 45 
comprising the step: 
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(d) storing an error code obtained from its own local 
memory corresponding to this error/defect in an 
integral diagnostic register (REG) of the respective 
processor (CPx) if an indication of an error and/or 
defect is detected during processing of (he test 
program. 

4. A method in accordance with claim 3, said error 
code indicative of an address of that test program com- 
mand or of those test program commands, at which the 
respective error defect was localized and identified. 

5. A method in accordance with claim 3, further 
comprising the step: 

(e) interrupting the processing of the test program by 
the respective processor (CPx) after the error code 
is stored. 

6. A method in accordance with claim 3, further 
comprising the step: 

(0 maintaining isolation of the respective processor 
(CPx) after the error code is stored. 

7. A method in accordance with claim 6, further 
comprising accessing each processor (CPx) isolated 
from the bus system (B:CMY) via a special code by a 
particular processor and interrogating the contents of 
the diagnostic register (REG) of this isolated processor 
(CPx) to diagnose the processor being isolated. 

8. A method in accordance with claim 1, further 
comprising the step: 

directly monitoring each of the dual processor units 
(PU) by each error detection circuit and directly 
indicating an error or a defect in a respective one of 
the dual processor units (PU isolating the respec- 
tive processor unit (PU) alone, whereby the normal 
processor operations are maintained via the other 
processor unit (PU) along within this respective 
processor. 

9. A method in accordance with claim 8, further, 
comprising storing an indication of this isolated one of 
the processor units (PU) in a diagnostic register of the 
respective processor. 

10. A method in accordance with claim 5, wherein a 
special processor accesses each processor (CPx) iso- 
lated from the bus system (B:CMY0/B:CMY1) via a 
special code, and interrogates the contents of the diag- 
nostic register (REG) to diagnose the processor being 
isolated. 
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BASIC-ABSTRACT: 



The process control system has a data input terminal (OE) that allow 
communication with a process via a data processing module (OV) and an 
output 

interface (AE) . Within the input terminal is an input keyboard (ET) , 

and a V.U 

(SG). 

The data processing module has two independently operating 
microcomputers (MCI , 

MC2) and a safety relay logic unit (RV) when a command is generated 
the data 

are held in a memory in the output interface (AE) and is transmitted 
to both 

microcomputers that execute a check routine. Provided that the same 
result is 

reached the command is allowed to be transmitted to output. When an 
auxiliary 

command mode is employed the second processor (MC2) outputs to a 
special 

display (KA) for operator verification. Only when an input key (FS) 
is 

actuated is the command executed. 

ADVANTAGE - Allows safe operation to be imposed on auxiliary mode. 
ABSTRACTED-PUB-NO: EP 120339A 
EQUIVALENT-ABSTRACTS : 

A device for the reliable process control employing two 
microcomputers which 

are independent of one another and do not operate so as to be safety- 
oriented 

and which commonly act upon the process which is to be controlled, 
and allow 

both control operations, whose reliability is tested in a separate 
safety plane 

outside the microcomputer, and also auxiliary operations, whose 
reliability is 

no longer tested, to be carried out, in particular for the control of 
a 

railroad signalling device of at least one operating location, 
characterised in 

that the one microcomputer (MCI) converts the process control 
instructions 

which are present in order to 1 be carried out, into corresponding 
command data, 
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and stores them in an output device (AE) and re-reads the stored 
data, where 

the re-read data are simultaneously fed to the other microcomputer 
(MC2) by 

means of a safety-oriented input doubler (EV) , that both 
microcomputers 

classify the data, which are fed thereto, independently of another in 
accordance with the respectively present process control instruction 
and feeds 

the classification results to a relay connection (RV) which when the 
classification 

results of the two microcomputers are identical causes the 

release of the data stored in the output device (AE) , if the 
respectively 

classified process control instruction relates to a control 
operation, but 

during common recognition of an auxiliary operation makes the release 
of the 

data stored in the output device dependent upon a separate agreement 
of an 

operator, which is fed via the relay connection (RV) . 
(8pp) 

EP 120339B 

Device for reliable process control employing two microcomputers 
(MC1,MC2) 

which are independent of another, do not operate in a safety-oriented 
manner 

and act jointly on the process to be controlled, and in doing so 
permit the 

execution of both control operations whose admissibility is checked 
on a 

separate safety level outside the microcomputer, and auxiliary 
operations whose 

admissibility is not checked further, and of which one (MCI) converts 
the 

process control instructions present for execution in each case into 
corresp . 

command data and forwards them to an output device (AE) , via which 
they can be 

switched further to the process, the command data being temp, stored 
and 

forwarded to an operator for checking in the case of an auxiliary 
operation, in 

partic, for controlling a railway signalling device from at elast 
one 
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operating terminal, characterised in that one of the microcomputers 
(MCI) 

re-reads the data stored in the output device (AE) , the re-read data 
being 

simultaneously also fed to the other microcomputer (MC2) via a 
safety-oriented 

input doubler (EV) , in that both microcomputers (MC1,MC2) classify 
independently of one another the data fed to them in accordance with 
the type 

of the respective process control instruction present and feed the 
classification results to a relay connection (RV) which, when the 
classification results of the two microcomputers match, initiates the 
release 

of the data stored in the output device (AE) if the respective 
classified 

process control instruction relates to a control operation, whereas 
in the case 

of joint recognition of an auxiliary operation it makes the release 
of the data 

stored in the output device dependent upon the agreement of the 
operator 

notified to it, and in that the microcomputer (MC2) not directly 
receiving the 

process control instructions in each case, where an auxiliary 
operation is 

concerned, converts the command data sent to it vai the input doubler 
(EV) into 

the corresp. process control instru 
CHOSEN-DRAWING: Dwg.1/1 
DERWENT-CLASS: Q21 T01 T06 
EPI-CODES: T01-J07; T06-A03; T06-A08; 
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ABSTRACT : 

1. A device for the reliable process control employing two 
microcomputers 

which are independent of one another and do not operate so as to be 
safety-oriented and which commonly act upon the process which is to 
be 

controlled, and allow both control operations, whose reliability is 
tested in a 

separate safety plane outside the microcomputer, and also auxiliary 
operations, 

whose reliability is no longer tested, to be carried out, in 
particular for the 

control of a railroad signalling device of at least one operating 
location, 
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characterized in that the one microcomputer (MCI) converts the 
process control 

instructions which are present in order to be carried out, into 
corresponding 

command data, and stores them in an output device (AE) and re-reads 
the stored 

data, where the reread data are simultaneously fed to the other 
microcomputer 

(MC2 ) by means of a safety-oriented input double (EV) , that both 
microcomputers 

classify the data, which are fed thereto, independently of another in 
accordance with the respectively present process control instruction 
and feeds 

the classification results to a relay connection (RV) which when the 
classification results of the two microcomputers are identical causes 
the 

release of the data stored in the output device (AE) , if the 
respectively 

classified process control instruction relates to a control 
operation, but 

during common recognition of an auxiliary operation makes the release 
of the 

data stored in the output device dependent upon a separate agreement 
of an 

operator, which is fed via the relay connection (RV) . 
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Programmable control unit with two processors and program memory. . . 

. .Abstract (Basic) : The programmable control unit contains an information 
bus connecting two processors and a program memory contg. 
interpretation type and basic or standard instructions. Signals are 
transferred. . . 



... USE /ADVANTAGE - Has improved internal geometry and processing 
efficiency. (43pp Dwg.No.2/36) 

...Abstract (Equivalent): commands to be executed by said processor; a data 
memory for temporarily storing data; a fixed memory for storing a 
self- diagnostic program; a communication interface participating in a 
communicating operation with a host computer; an I... 

...internal bus for mutually connecting said processor, said 1-bit 

processor, said data memory, said fixed memory, said communication 
interface and said I/O interface... 

...A programmable controller comprising: a first processor ; a second 
processor ; a program memory s'ystem for storing interpreter type 
program commands and basic commands; a first... 



bus through which said first processor is connected to said program 
memory system; and a second information bus through which said 
second processor is connected to said first processor ; whereby 
said first processor reads a program command and judges whether said 
program command be executed by said first processor or said second 
processor ; when said first processor judges that said program 
command be executed by said first processor, said first processor 
itself operates, said first processor reads said program command from 
said program memory preparatory to execution of said program 
command and at the same time issues a pseudo command to said second 
processor ; and when said first processor judges that said program 
command be executed by said second processor , said first processor 
supplies said program command to said second processor in place of 
said pseudo command. . . 

Abstract (Equivalent) : a read only memory (ROM) for storing a self- 
diagnostic program. . . 



iiiiiiiiiiiiiiiiiiinni 

US005553297A 

United States Patent [19] [ii] Patent Number: 5,553,297 

Yonezawa et al. [45] Date of Patent: Sep. 3, 1996 



[54] INDUSTRIAL CONTROL APPARATUS 

[75] Inventors: Masaaki Yonezawa; Kiyoshi 

Hasegawa; Yasunori Kawata; Kouji 
Matsuoka; Takasi Kadowaki, all of 
Tokyo, Japan . 

[73] Assignee: Yokogawa Electric Corporation, 
Musashino, Japan 

[21] Appl. No.: 119,322 
[22] Filed: Sep. 9, 1993 

Related U.S. Application Data 

[63] Continuation of Ser. No. 513,454, Apr. 23, 1990, abandoned. 

[30] Foreign Application Priority Data 

Apr. 24, 1989 [JP] Japan 1-104052 

May 23, 1989 [JP] Japan 1-129707 

Oct 23. 1989 [JP] Japan 1-275512 

Oct 23. 1989 [JP] Japan „ 1-275513 

[51] Int CL 6 .: „ G06F 15/16 

[52] US. CI 395/800; 364/132; 364/136; 

364/DIG. 1 

[58] Field of Search 395/800; 364/131-136 

[56] References Cited 

U.S. PATENT DOCUMENTS 

4,303,990 12/1981 Seipp 395/740 

4,592,010 5/1986 Wollscheid 395/800 

4,600,988 7/1986 Tendulkar et al 395/550 

4,648,068 3/1987 Ninnemannet al 395/281 

4,750,110 6/1988 Mothersole et al 395/375 

4,797,811 1/1989 Kiyokawa et al 364/474.23 



4,858,101 8/1989 Stewart et al „ 364/131 

4,942^52 7/1990 Merrill et al - ;„ 395/826 

4,972,365 11/1990 Dodds et al „ 395/825 

4,985,831 1/1991 Dulong et al 395/650 

5,068,778 11/1991 Kosem et al 364/138 

5,068,821 11/1991 Sexton et al 395/800 

5,287,548 2/1994 Flood etal 395/375 

OTHER PUBLICATIONS 

MC6888 1/MC68882 Floating-Point Coprocessor User's 
Manual, 1987, pp. 7-1 to 7-38. 

M68000 Family Reference, 1988, pp. -62 to 3-63, 3-66 to 

3- 67, 4-32 to 4-35. 4-50 to 4-51, 4-6 to 4-6, and 4-74 to 

4- 75. 

Primary Examiner— William M. Treat 
Attorney, Agent, or Firm— Pennie & Edmonds 



[57] 



ABSTRACT 



A programmable controller is disclosed that incorporates a 
sequence control program adapted to input and output 
information to a variety of local intelligent appliances at a 
high velocity and a high efficiency in the field of factory 
automation for efficiently controlling factory production 
lines or the field of process automation for controlling 
multiple industrial processes. Provided for this purpose is an 
improved a change-over system associated with a 1-bit 
processor and an ordinary processor for executing the 
sequence control program. Also improved are functions for 
a user to particularly specify a BASIC program, software/ 
hardware control functions for a group of I/O boards for 
transferring signals to receiving signals from the local 
appliances, and the ability to program the internally stored 
sequence control program. 

7 Claims, 29 Drawing Sheets 
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FIG. 15 is a diagram representing a software program in 
CPU board 100 that carries out parallel operations of 
sequence control processing SQ and BASIC processing 
BAS. 

Sequence control processing SQ is intended to execute a 
string of program commands such as those shown in FIGS. 
12 and 13, while BASIC processing BAS is associated with 
a data readout program which is particularly created by the 
user, or is a standard communication program, written in 
BASIC An execution right change-over processing unit ES, 
a timer T, and a task scheduler TS have functions which are 
defined by software stored inside CPU board 100. Task 
scheduler TS functions to determine the priority for execu- 
tion with respect to several processes in the BASIC pro- 
cessing program. This function is not, however, directly 
associated with the operations of the present invention. 

It should be noted that generally several hundreds or 
several thousands of sequence control program commands 
arc provided, and the execution time of one cycle of that 
program is several tens or hundreds of milliseconds (ms), 
depending largely on the configuration of the system to be 
controlled. For this example, regardless of what the execu- 
tion time for one cycle is, it will be assumed that 10 ms is 
the time set in timer T The time set in timer T is not limited 
to 10 ms but may be arbitrarily set depending on the system 
configuration. 

As shown in the example of FIG. 15, the first step of 
sequence control processing SQ is to execute a load com- 
mand LD. Subsequently, the execution moves to an AND 
command, an OR command, and an output command 
(OUT), at which time 10 ms passes. Then, a time-up signal 
is transmitted as an interrupt signal from timer T to execu- 
tion right change-over processing unit ES. CPU 1 inputs this 
interrupt signal, but at this time the control executing right 
is still held by BPU 2. As a result, CPU 1 is unaffected by 
this time-up interrupt. 

The sequence processing further advances, and BPU 2 
executes load command LD. By the time the next 10 ms 
time-up interrupt signal is transmitted, the control executing 
right has been handed over to CPU 1 for execution of 
application command (1). Consequently, execution right 
change-over processing unit ES changes the processing 
which is to be executed by CPU 1 from sequence processing 
SQ to BASIC processing BAS. Thus, CPU 1 starts process- 
ing BASIC processing program BAS in accordance with 
task scheduler TS. Execution of sequence control processing 
SQ remains stopped. 

Thereafter, timer T is again brought into a time-up state, 
and another 10 ms interrupt signal is generated. At this time, 
execution right change-over processing unit ES in CPU 1 
temporarily stores the present conditions of the BASIC 
program (e.g., values of registers, value of a program 
counter, etc.) in data memory 7 (shown in FIG. 11), and 
causes the operation of sequence control processing SQ, 
which had been stopped, to resume. 

When execution of sequence control processing SQ is 
completed, sequence control processing SQ transmits a 
notice of completion to execution right change-over pro- 
cessing unit ES, irrespective of whether a time-up signal has 
been issued, and CPU 1 initiates or resumes execution of go 
BASIC processing BAS. 

In sum, the system operates by alternating execution of 
sequence control processing SQ and BASIC processing 
BAS every 10 ms while CPU 1 has the execution right. As 
a result, sequence processing SQ and BASIC processing 
BAS appear to the user as if they were being executed 
simultaneously. Hence, multi-processing is practicable. 
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On the other hand, some system configurations do not 
need parallel execution of sequence control processing and 
BASIC processing. In some cases, either sequence control 
processing or BASIC processing is unnecessary, depending 
on modifications of the system. In that situation, CPU board 
100 incorporates a program such as that shown in the 
flowchart of FIG. 16, and the system operates as follows. 

After the power is turned ON and a self-diagnostic 
operation is executed, it is determined whether a working 
language is designated by the host computer. If not, as a 
default, the system proceeds as if the sequence language 
were specified: a system language table necessary for pro- 
cessing the sequence language is created in data memory 7 
and sequence language processing begins. If a language is 
designated by the host computer, the program depicted in 
FIG. 16 checks to determine which language is specified. If 
the available language is a sequence language, the program 
proceeds as in the default situation described above. If the 
BASIC language is designated, a system table required for 
processing the BASIC language is created in data memory 
7 and BASIC language processing begins. If both the 
sequence language and the BASIC language are designated, 
system tables necessary for processing both the sequence 
language and the BASIC language are created in data 
memory 7 and processing of both proceeds. 

In this manner, the program creates the system tables 
corresponding to the languages designated, thereby facili- 
tating the designation of the processing languages from the 
host computer, e.g., a desktop computer. 

Next, a system as depicted in FIG. 17 will be described 
that uses a standard I/O driver set in CPU board 100 and thus 
is capable of utilizing various types of I/O boards. 

An operating system OS in CPU board 100 receives an 
access request al for an I/O board C from a user program UP 
(such as a BASIC language program stored in CPU board 
100) or an interrupt request a2 from I/O board C. Access 
request al from user program UP prompts an I/O request 
receiving process (1) to start, while interrupt request a2 
prompts an interrupt request receiving process (2) to start. 

When starting up the system, a standard I/O driver SD 
according to the present invention generates process defini- 
tion tables TBL1, TBL2 TBLn for the individual I/O 

boards fitted to respective inter-unit slots. (Collectively, the 
tables are designated TBL; an arbitrary one of process 
definition tables TBL will be referred to as TBLi, i=l to n.) 
When an I/O request receiving process (1) or an interrupt 
request receiving process (2) is begun, preparation for 
accessing the process definition tables TBL starts. 

Upon initiating either process (1) or (2), a determination 
is made as to whether the I/O board to be accessed is 
classified as a register/interface type or a command/interface 
type, and then a corresponding main processing is executed. 
An example of a register/interface type I/O board is the 
ordinary I/O board CI depicted in FIG; 11. The command/ 
interface type I/O board is typified by I/O board C2 which, 
as illustrated in FIG. 11, incorporates a microprocessor to 
transfer and receive commands. 

Common or standard operations for I/O request receiving 
process (1) and interrupt request receiving process (2) are 
performed as a common processing routine and are executed 
regardless of the type of interface of the I/O board. In 
addition, a special processing routine is executed for I/O 
boards requiring a different processing from the standard 
processing. At start-up, the special processing routine is 
uploaded to standard I/O driver SD from the I/O board that 
requires special processing. Also at start-up, an address of 
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FIG. 26 shows an example of the contents of comment file 
CF. FIG. 27 represents an example of the contents of 
circuit/comment corresponding table CCT. Comment file CF 
may also be stored in RAM 7 in CPU board 100 rather than 
in the programming tool. Additionally, a comment file which 
corresponds the step numbers to the subcomments may be 
added. 

The operation of seeking out specific locations of the 
ladder circuits on the programming tool will be described 
with reference to FIG. 28. The programming tool reads the 
contents of comment file CF and circuit/comment corre- 
sponding table CCT, which is preset to indicate subsequent 
display pictures SI, S2, S3, and S4 on the CRT display unit 

An initial picture displayed on the CRT display unit of the 
programming tool is a circuit monitor menu picture SI, from 
which picture a circuit comment display picture S2 is 
selected. Then, a circuit comment list is displayed in a list 
format on the CRT. From the circuit comment display 
picture S2, a subcomment display picture S3 is further 
selected, and thereafter all the subcomments included in the 
circuit comment are displayed. When a circuit display 
picture S4 is selected corresponding to the subcomment, the 
ladder circuit corresponding to this subcomment is dis- 
played. 

In particular, in order to specify a certain ladder circuit 25 
from the ladder program shown in FIG. 24, a desired circuit 
comment is selected by displaying a list of the circuit 
comments, and a specific ladder circuit can be displayed on 
the CRT display by designating a ladder circuit correspond- 
ing to the subcomment included in this circuit comment. 

Note that the circuit picture S4 may be selected from the 
circuit monitor menu picture SI or the circuit comment list 
S2. Page update scrolling and page update monitoring can be 
effected on the respective display pictures. 

As stated above, in a programmable controller according 
to the invention, the comment file and the circuit/comment 
corresponding table are set and then read out. A correspon- 
dence between the ladder circuits and a variety of comments 
added to the ladder circuits is established to enable a 
hierarchical display on a CRT. Thus, it is possible to 40 
immediately detect a desired ladder circuit. 

The arrangement described above provides easy-to- search 
ladder circuits, useful for adjusting the circuits or locating an 
error. The description will next focus on improvements in 
creating the ladder program and on operation during debug- 
ging. 

FIG. 29 shows a flowchart of a processing routine of a 
programmable controller according to the present invention. 
A sequence control process which uses a ladder program is 
executed by a processing routine consisting of common 
processing such as a self-diagnosis, I/O refresh processing of 
I/O registers in an I/O board, execution of the stored ladder 
program, and service processing for a host appliance. 

In a programmable controller according to the invention, 
the I/O refresh process subsequent to the common process is, 
as illustrated in FIG. 29, omitted in order to perform 
programming and debugging when creating the sequence 
control program without mounting the I/O board. With this 
arrangement, the debugging operation can be done, without 
the I/O board, in conformity with an instruction from a 
debugger such as a programming tool or the like. 

FIG. 30 is a diagram showing a programming function of 
the ladder program in a programmable controller according 
to the present invention. Respective blocks in the Figure 
represent software-functional blocks of the programmable 
controller. 
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The individual functional blocks in FIG. 30 work as 
follows. A circuit editing function 201 is for editing respec- 
tive circuit components of the ladder circuit, with which a 
programmer describes addresses of the respective circuit 
components by using signal names similar to device names 
used when designing the ladder circuit. A signal defining 
function 202 serves as a unit for storing in a table format a 
correspondence of the addresses to the signal names of the 
respective circuit components. A compile function 203 
transmits to a sequencer system 205 a program in an 
executable format with reference to the signal names in the 
ladder circuit The corresponding addresses are supplied 
from signal defining function 202 and from an address 
automatic generating function 204. Address automatic gen- 
erating function 204 automatically assigns detailed 
addresses to the signal names supplied from signal defining 
function 202. 

Procedures for creating the ladder program by utilizing 
these functions are described below. 

A ladder circuit such as that depicted in FIG. 31 is 
generated in cooperation with the programming tool and 
circuit editing function 201. The individual circuit compo- 
nents of a relay unit, an output unit, and so on are set in the 
form of signal names such as SW1 and COIL1. As shown in 
FIG. 32, signal defining function 202 arranges signal names 
S Wl, SW2, COIL1, COIL2, IRL1, TTM1, CNT1 and REG1 
to correspond to addresses X, X, Y, Y, I, T, C and D, where 
the symbol X is an address representing an input, Y is an 
output address, I is an internal relay address, T is a timer 
address, C is a counter address, and D is a data register 
address. 

The detailed addresses are automatically . assigned in 
address automatic generating function 204. For example, for 
the address X, set to circuit element S Wl, a detailed address 
X001 is assigned. The detailed addresses are set sequentially 
to correspond to the number of the signal names of the 
ladder circuit components. In this example, the address 
X001 is set to the signal name SW1, and the address X002 
is set to the signal name SW2. The results for all the 
components of FIG. 32 are shown in FIG. 33. 

While processing is performed in sequencer system 205 in 
accordance with the program in executable format produced 
by compile function 203, debugging can be carried out 
solely by CPU board 100 without mounting the I/O board, 
because the I/O refresh process is, as illustrated in the 
flowchart of FIG. 29, bypassed if the I/O board is not 
mounted at the debugging stage. Based on the results of 
debugging, detailed addresses arc added as needed. 

Thus, the signal names can automatically correspond to 
the detailed addresses without a programmer being aware of 
the addresses of the respective ladder circuit components, 
which in turn facilitates a design of the sequence control 
program. Debugging can be accomplished without an I/O 
board, and the ladder program operations can be debugged 
before finishing the design of a. relay board which corre- 
sponds to the sequence process. 

A programmable controller according to the present 
invention typically performs programming with a ladder 
program having thousands of steps by splitting the number 
of blocks per step. FIGS. 34(a) through 34(c) show pro- 
gramming modes. 

FIGS. 34(a), 34(fc) f -and 34(c) represent steps 1, 2, and 3 
to 5 of a part of a ladder program. 

A start command "ACT PROG1.2" and an end command 
U INACT PROG1.1" are set at the final substep of step 1. 
With this arrangement, when a control operation of step 1 
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reaches the final substep, a block ladder program PROG1.2 
of step 2 is started by stopping a block ladder program 
PROG1.1 of step 1, thereby initiating the control operation 
of step 2. 

Set at the final substep of the block ladder program 5 
PROG1.2 are a stop command "INACT PROG1.2", for 
stopping the block ladder program PROG1.2, and parallel 
start commands "ACT PROG1.2", "ACT PROG2.1", and 
"ACT PROG3.1", for simultaneously starting steps 3 to 5. 

The ends of steps 3 and 4 are monitored in step 5. When 10 
detecting the cessation of steps 3, 4, and 5, the operation 
returns to step 1, i.e., the starting step of the sequence control 
process, in response to stop command "INACT PROG3.1" 
and start command "ACT PROGl.l". 

The start commands "ACT" and the stop commands 15 
"INACT" of the ladder program as thus defined make it 
possible to effect programming in parallel by splitting a 
series of thousands of sequence control programs into sev- 
eral blocks. In addition, the ladder programs inform each 
other of starting and ending, whereby the sequence control 20 
operation can be carried out by storing the ladder programs 
split into blocks in a plurality of CPU boards. 

Tbming to FIG. 35, an example is shown where a control 
object M (a product on a production line) on a control line ^ 
L is controlled by combining a CPU board 101 in which a 
BASIC program is stored, and CPU boards 102 to 104 in 
which only ladder programs are stored. 

Ladder programs LD1 and LD2 are stored in CPU board 
102, to which an I/O board group CIO is connected; ladder 30 
programs LD3 through LD5 are stored in CPU board 103, to 
which an I/O board group C20 is connected; and a ladder 
program LD6 is stored in the. CPU board 104, to which an 
I/O board group C30 is connected. 

FIG. 36 illustrates an example of the BASIC program 35 
stored in CPU board 101. 

Ladder programs LD1 through LD6 are defined as a series 
of sequence control programs with respect to control object 
M, these ladder programs having the block construction 
described above and being programmed independently. 40 

According to the example shown in FIG. 36, when 
starting operation, the BASIC program stored in CPU board 
101 sequentially actuates ladder programs LD1 and LD2 of 
CPU board 102. After these programs have ended, ladder 
program LD3 or LD4 of the CPU board 103 is executed in 45 
accordance with the sequence processing results at that time, 
indicated by CONDI. Immediately after finishing the pro- 
gram LD3 or LD4, the BASIC program functions to start in 
parallel both ladder program LD5 incorporated into CPU 
board 103 and ladder program LD5 incorporated into CPU 50 
board 104. 

In accordance with the illustrated system, sequence con- 
trol programs in which a series of sequence control opera- 
lions are divided into blocks are handled and processed by 55 
a plurality of programmable controllers. Thus, highly effi- 
cient sequence control processing can be done. 

As discussed above, a programmable controller according 
to the invention is capable of improving the velocity of 
sequence control processing and attaining an easy-to-rede- 50 
sign system when modification is desired. A programmable 
controller exhibiting a high processing efficiency can thus be 
realized. 

Although the illustrative embodiments of the present 
invention have been described in detail with reference to the 65 
accompanying drawings, it should be understood that the 
present invention is not limited to those precise embodi- 
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ments. Various changes or modifications may be made by 
one skilled in the art without departing from the scope or 
spirit of the invention, and are limited only by the scope of 
the claims. 
What is claimed is: 

1. A programmable controller comprising: 

one or more I/O boards for transferring and receiving 
multiple information to and from a control object; and 

a processor board for imparting a control signal to the 
control object via the I/O boards, the processor board 
comprising: 

first processor means for controlling the processor board, 
for executing commands of a sequence control program 
type, and for executing commands of a BASIC pro- 
gram type for performing general-purpose arithmetic 
processing, information processing, or a control opera- 
tion, by starting executing the commands of sequence 
control program type and making an end instruction; 

program memory means for storing the commands of 
sequence control program type; 

1-bit processor means, connected directly to the program 
memory means, for executing commands to be 
executed that are sequentially read from the program 
memory means and transmitting the commands to be 
executed to the first processor means if the commands 
to be executed are of the sequence control program type 
to be executed by the first processor means; 

a random access memory (RAM) for temporarily storing 
data; 

a read only memory (ROM) for storing a self-diagnostic 
program; 

a communication interface for communicating with a host 
computer; 

an I/O bus through which I/O boards for transferring and 

receiving multiple information are connected; 
an I/O interface connected to the I/O bus; and 
an internal bus for mutually connecting the first processor 
means, the 1-bit processor means, the RAM, the ROM, 
the communication interface, and the I/O interface. 

2. The programmable controller as claimed in claim 1, 
wherein at a start-up time an I/O driver stored in the 
processor board stores a process definition table for storing 
in a table format information read from an I/O board 
regarding a board ID, a type of interface of the I/O board, the 
number of channels, a command register address, a buffer 
address, a data register address, an address for designating 
special processing when special processing is needed, and 
wherein the I/O driver refers to the process definition table 
when effecting a data outputting process. 

3. The programmable controller as claimed in claim 1, 
further comprising: 

strobing signal generating means for generating strobing 
signals provided in the processor board and the I/O 
boards, wherein when starting a data transfer cycle, the 
processor board requests a data transfer by transmitting 
to one I/O board a number of strobing signals, each 
strobing signal making transmission of an information 
frame, effective, and wherein the one I/O board receiv- 
ing the data transfer request transmits to the processor 
board a number of other strobing signals, each strobing 
signal making transmission of an information frame 
effective, thus ending the data transfer cycle. 

4. A programmable controller as claimed in claim 1, 
further comprising one or more abnormality detecting cir- 
cuits that correspond to different kinds of abnormalities 
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which can arise, wherein the first processor means receives 
an abnormality detecting signal as an interrupt signal and 
stores, in a table format in memory, time data from an 
internal timer regarding the occurrence of the detected 
abnormality and a description of the detected abnormality. 

5. A programmable controller as claimed in claim 1, 
further comprising: 

means for assigning signal names to circuit elements 
when creating ladder circuits; 

means for corresponding the signal names to addresses in 
accordance with a preset signal name-to-address cor- 
respondence table; and 

means for sequentially assigning detailed addresses cor- 
responding to the signal names, 

wherein an I/O refresh process in a sequence control 
process routine is omitted when an I/O board is not 
connected with the controller. 

6. A programmable controller as claimed in claim 1, 
wherein a series of sequence control operations are split into 
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blocks corresponding to several steps when programming a 
ladder program, and wherein the ladder program is stored 
and executed block by block by storing in a final substep of 
each block a command for stopping the execution of that 
5 block and a command for specifying the block to be 
executed next 

7. A programmable controller as claimed in claim 1, 
further comprising: 
means for storing a comment file for storing comments 
10 added to ladder circuits in a ladder program generated 
in a ladder language and corresponding step numbers of 
the ladder circuit concerned; and 
means for storing a circuit/comment table in which the 
15 step numbers of the ladder circuit concerned corre- 
spond to comment numbers in the comment file, 
wherein the circuit comments, the step numbers and the 
ladder circuits are read from a programming tool. 

* * * * * 
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We present a framework that allows translation of predicated code into the static single 
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predicated code. In particular, we represent predicate join points in the program by the 
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Fortran programs is examined. Although extrapolation is difficul ... 
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Cache miss characterization models such as the three Cs model are useful in developing 
schemes to reduce cache misses and their penalty. In this paper we propose the OPT 
model that uses cache simulation under optimal (OPT) replacement to obtain a finer and 
more accurate characterization of misses than the three Cs model. However, current 
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We describe the Slice Processor micro-architecture that implements a generalized 
operation-based prefetching mechanism. Operation-based prefetchers predict the series 
of operations, or the computation slice that can be used to calculate forthcoming memory 
references. This is in contrast to outcome-based predictors that exploit regularities in the 
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based prefetching mechanisms such as stream buffers where the ... 
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We present a method that combines a deep analysis of program dependences with a 
broad analysis of the interaction among procedures. The method is more efficient than 
existing methods: we reduce many tests, performed separately by existing methods, to a 
single test. The method is more precise than existing methods with respect to references 
to multi-dimensional arrays and dependence information hidden by procedure calls. The 
method is more general than existing methods: we accommodate potent ... 

9 Metadatabase solutions for enterprise information integration problems Q 
^ Cheng Hsu, Laurie Rattner 

v January 1993 ACM SIGMIS Database, volume 24 issue l 
Publisher: ACM Press 

Full text available: ^pdf(129 MB) Additional Information: full citation , abstract , index terms 

The success of modern information technology in the past decades has brought about the 
proliferation of systems dedicated to individual groups of applications and functions. This 
proliferation, in turn, has led to the need for enterprise-wide management and integration 
of information, and has triggered major efforts such as systems integration, re- 
engineering, and computer integrated manufacturing. Nonetheless, achieving such 
integration remains a challenge.To effectively manage information reso ... 
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The Mesa programming language supports program modularity in ways that permit 
subsystems to be developed separately but to be bound together with complete type 
safety. Separate and explicit interface definitions provide an effective means of 
communication, both between programs and between programmers. A configuration 
language describes the organization of a system and controls the scopes of interlaces. 
These facilities have had a profound impact on the way we design systems and organize 
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Expert systems are being used to govern the intelligent control of the Robotic Air Vehicle 
(RAV) which is currently a research project at the Air Force Avionics Laboratory. Due to 
the nature of the RAV system the associated expert system needs to perform in a 
demanding real-time environment. The use of a parallel processing capability to support 
the associated computational requirement may be critical in this application. Thus, parallel 
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This paper analytically studies the performance of a synchronous conservative parallel 
discrete-event simulation protocol. The class of models considered simulates activity in a 
physical domain, and possesses a limited ability to predict future behavior. Using a 
stochastic model, it is shown that as the volume of simulation activity in the model 
increases relative to a fixed architecture, the complexity of the average per-event 
overhead due to synchronization, event list manipulation, looka ... 
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A probabilistic analysis is employed to determine the effect of hierarchical storage 
organizations on information retrieval operations. The data storage hardware is assumed 
to consist of n-levels of linearly connected memory hardware with increasing data access 
times and increasing data storage capabilities. A system might, for example, consist of 
fast semiconductor memory, computer core memory, extended core storage, disk 
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Many system errors do not emerge unless some intricate sequence of events occurs. In 
practice, this means that most systems have errors that only trigger after days or weeks 
of execution. Model checking [4] is an effective way to find such subtle errors. It takes a 
simplified description of the code and exhaustively tests it on all inputs, using techniques 
to explore vast state spaces efficiently. Unfortunately, while model checking systems code 
would be wonderful, it is almost never done in pra ... 
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Cloud cover photographs transmitted from meteorological satellites must be processed 
and interpreted before weather maps can be issued. Most of the routine processing can be 
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handled by present day digital computer techniques; however, the recognition and 
interpretation of cloud patterns such as vortices indicating hurricanes, must still be 
performed by humans due to the lack of suitable recognition mechanisms. This paper 
investigates the feasibility of using a perceptron-type computer for t ... 

17 Toward real time simulation: prototyping of a large scale parallel ground target 
simulation 

John B. Gilmer, David W. CTBrien, Jeffery E. Payne 

December 1990 Proceedings of the 22nd conference on Winter simulation 
Publisher: IEEE Press 

Full text available: ^ pdf(878.73 KB) Additional Information: full citation, references, citings, index terms 



18 A general purpose design automation file system 
T. Beretvas, C. H. Liu, R. L. Taylor 

January 1967 Proceedings of the 4th conference on Design automation 
Publisher ACM Press 

Full text available: ^) pdf(593.50 KB) Additional Information: full citation , abstract, citings, index terms 

This file system grew out of exploratory work in the file system-terminal area. The main 
reason for such a project was to explore file organizations that would be useful for a wide 
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April 1994 ACM SIG ARCH Computer Architecture News , Proceedings of the 21ST 

annual international symposium on Computer architecture ISCA '94, volume 

22 Issue 2 

Publisher: IEEE Computer Society Press, ACM Press 

Full text available* 153 pdfd 00 MB) Additional Information: full citation , abstract, references, citings , index 
u e av ' TUP terms 

The characteristics of several commercial and technical workloads on the DEC 7000 AXP 
system are compared using built-in hardware monitors. The data analyzed include total 
instructions, cycles, multiple-issued instructions, stall components, cache misses, and 
instruction types. The data indicates that the two classes of workloads have vastly 
different characteristics and impose different requirements on the system design. 
Compared to VAX, Alpha AXP takes advantage of lower cycles per instruction ... 

20 The preconditioned conjugate gradient method on the hypercube 
G. Abe, K. Hane 

January 1989 Proceedings of the third conference on Hypercube concurrent 
computers and applications - Volume 2 

Publisher: ACM Press 

Full text available- fQpdf(814 33 KB) Additional Information: full citation , abstract, references , citings , index 
" ' terms 

A parallel algorithm for solving the elliptic partial differential equation (PDE) is described 
in this paper through the finite difference method (FDM) The Concurrent Preconditioned 
Conjugate Gradient method is developed to optimize processor load balancing. This 
algorithm is evaluated on a hypercube-based concurrent machine, the Intel iPSC. 
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